Skip to main content

KeyValueStore

crawlee.storages._key_value_store.KeyValueStore

Represents a key-value based storage for reading and writing data records or files.

Each data record is identified by a unique key and associated with a specific MIME content type. This class is commonly used in crawler runs to store inputs and outputs, typically in JSON format, but it also supports other content types.

Data can be stored either locally or in the cloud. It depends on the setup of underlying storage client. By default a MemoryStorageClient is used, but it can be changed to a different one.

By default, data is stored using the following path structure:

{CRAWLEE_STORAGE_DIR}/key_value_stores/{STORE_ID}/{KEY}.{EXT}
  • {CRAWLEE_STORAGE_DIR}: The root directory for all storage data specified by the environment variable.
  • {STORE_ID}: The identifier for the key-value store, either "default" or as specified by CRAWLEE_DEFAULT_KEY_VALUE_STORE_ID.
  • {KEY}: The unique key for the record.
  • {EXT}: The file extension corresponding to the MIME type of the content.

To open a key-value store, use the open class method, providing an id, name, or optional configuration. If none are specified, the default store for the current crawler run is used. Attempting to open a store by id that does not exist will raise an error; however, if accessed by name, the store will be created if it does not already exist.

Usage:

kvs = await KeyValueStore.open(name='my_kvs')

Index

Constructors

__init__

  • __init__(id, name, configuration, client): None
  • Parameters

    • id: str
    • name: str | None
    • configuration: Configuration
    • client: BaseStorageClient

    Returns None

Methods

drop

  • async drop(): None
  • Returns None

get_info

  • async get_info(): KeyValueStoreMetadata | None
  • Get an object containing general information about the key value store.


    Returns KeyValueStoreMetadata | None

get_value

  • async get_value(key): Any
  • Parameters

    • key: str

    Returns Any

get_value

  • async get_value(key, default_value): T
  • Parameters

    • key: str
    • default_value: T

    Returns T

get_value

  • async get_value(key, default_value): T | None
  • Parameters

    • key: str
    • default_value: T | None = None

    Returns T | None

get_value

  • async get_value(key, default_value): T | None
  • Get a value from the KVS.


    Parameters

    • key: str
    • default_value: T | None = None

    Returns T | None

iterate_keys

  • async iterate_keys(exclusive_start_key): AsyncIterator[KeyValueStoreKeyInfo]
  • Iterate over the existing keys in the KVS.


    Parameters

    • exclusive_start_key: str | None = None

    Returns AsyncIterator[KeyValueStoreKeyInfo]

open

  • async open(*, id, name, configuration, storage_client): KeyValueStore
  • Parameters

    • id: str | None = Nonekeyword-only
    • name: str | None = Nonekeyword-only
    • configuration: Configuration | None = Nonekeyword-only
    • storage_client: BaseStorageClient | None = Nonekeyword-only

    Returns KeyValueStore

set_value

  • async set_value(key, value, content_type): None
  • Set a value in the KVS.


    Parameters

    • key: str
    • value: Any
    • content_type: str | None = None

    Returns None

Properties

id

id: str

name

name: str | None