Skip to main content

KeyValueStore

Represents a key-value based storage for reading and writing data records or files.

Each data record is identified by a unique key and associated with a specific MIME content type. This class is commonly used in crawler runs to store inputs and outputs, typically in JSON format, but it also supports other content types.

Data can be stored either locally or in the cloud. It depends on the setup of underlying storage client. By default a MemoryStorageClient is used, but it can be changed to a different one.

By default, data is stored using the following path structure:

{CRAWLEE_STORAGE_DIR}/key_value_stores/{STORE_ID}/{KEY}.{EXT}
  • {CRAWLEE_STORAGE_DIR}: The root directory for all storage data specified by the environment variable.
  • {STORE_ID}: The identifier for the key-value store, either "default" or as specified by CRAWLEE_DEFAULT_KEY_VALUE_STORE_ID.
  • {KEY}: The unique key for the record.
  • {EXT}: The file extension corresponding to the MIME type of the content.

To open a key-value store, use the open class method, providing an id, name, or optional configuration. If none are specified, the default store for the current crawler run is used. Attempting to open a store by id that does not exist will raise an error; however, if accessed by name, the store will be created if it does not already exist.

Usage

from crawlee.storages import KeyValueStore

kvs = await KeyValueStore.open(name='my_kvs')

Hierarchy

Index

Methods

__init__

  • __init__(*, id, name, configuration, client): None
  • Parameters

    • optionalkeyword-onlyid: str
    • optionalkeyword-onlyname: str | None
    • optionalkeyword-onlyconfiguration: Configuration
    • optionalkeyword-onlyclient: BaseStorageClient

    Returns None

drop

  • async drop(): None
  • Drop the storage. Remove it from underlying storage and delete from cache.


    Returns None

get_info

  • Get an object containing general information about the key value store.


    Returns KeyValueStoreMetadata | None

get_public_url

  • async get_public_url(*, key): str
  • Get the public URL for the given key.


    Parameters

    • optionalkeyword-onlykey: str

      Key of the record for which URL is required.

    Returns str

get_value

  • async get_value(*, key): Any
  • Parameters

    • optionalkeyword-onlykey: str

    Returns Any

get_value

  • async get_value(*, key, default_value): T
  • Parameters

    • optionalkeyword-onlykey: str
    • optionalkeyword-onlydefault_value: T

    Returns T

get_value

  • async get_value(*, key, default_value): T | None
  • Parameters

    • optionalkeyword-onlykey: str
    • optionalkeyword-onlydefault_value: T | None = None

    Returns T | None

get_value

  • async get_value(*, key, default_value): T | None
  • Get a value from the KVS.


    Parameters

    • optionalkeyword-onlykey: str

      Key of the record to retrieve.

    • optionalkeyword-onlydefault_value: T | None = None

      Default value returned in case the record does not exist.

    Returns T | None

iterate_keys

  • Iterate over the existing keys in the KVS.


    Parameters

    • optionalkeyword-onlyexclusive_start_key: str | None = None

      Key to start the iteration from.

    Returns AsyncIterator[KeyValueStoreKeyInfo]

open

  • Open a storage, either restore existing or create a new one.


    Parameters

    • optionalkeyword-onlyid: str | None = None

      The storage ID.

    • optionalkeyword-onlyname: str | None = None

      The storage name.

    • optionalkeyword-onlyconfiguration: Configuration | None = None

      The configuration to use.

    Returns BaseStorage

set_value

  • async set_value(*, key, value, content_type): None
  • Set a value in the KVS.


    Parameters

    • optionalkeyword-onlykey: str

      Key of the record to set.

    • optionalkeyword-onlyvalue: Any

      Value to set. If None, the record is deleted.

    • optionalkeyword-onlycontent_type: str | None = None

      Content type of the record.

    Returns None

Properties

id

id: str

Get the storage ID.

name

name: str | None

Get the storage name.