Skip to main content

MemoryDatasetClient

Memory implementation of the dataset client.

This client stores dataset items in memory using Python lists and dictionaries. No data is persisted between process runs, meaning all stored data is lost when the program terminates. This implementation is primarily useful for testing, development, and short-lived crawler operations where persistent storage is not required.

The memory implementation provides fast access to data but is limited by available memory and does not support data sharing across different processes. It supports all dataset operations including sorting, filtering, and pagination, but performs them entirely in memory.

Hierarchy

Index

Methods

__init__

  • __init__(*, metadata): None
  • Initialize a new instance.

    Preferably use the MemoryDatasetClient.open class method to create a new instance.


    Parameters

    Returns None

drop

  • async drop(): None
  • Drop the whole dataset and remove all its items.

    The backend method for the Dataset.drop call.


    Returns None

get_data

  • async get_data(*, offset, limit, clean, desc, fields, omit, unwind, skip_empty, skip_hidden, flatten, view): DatasetItemsListPage
  • Get data from the dataset with various filtering options.

    The backend method for the Dataset.get_data call.


    Parameters

    • optionalkeyword-onlyoffset: int = 0
    • optionalkeyword-onlylimit: int | None = 999_999_999_999
    • optionalkeyword-onlyclean: bool = False
    • optionalkeyword-onlydesc: bool = False
    • optionalkeyword-onlyfields: list[str] | None = None
    • optionalkeyword-onlyomit: list[str] | None = None
    • optionalkeyword-onlyunwind: str | None = None
    • optionalkeyword-onlyskip_empty: bool = False
    • optionalkeyword-onlyskip_hidden: bool = False
    • optionalkeyword-onlyflatten: list[str] | None = None
    • optionalkeyword-onlyview: str | None = None

    Returns DatasetItemsListPage

get_metadata

iterate_items

  • async iterate_items(*, offset, limit, clean, desc, fields, omit, unwind, skip_empty, skip_hidden): AsyncIterator[dict[str, Any]]
  • Iterate over the dataset items with filtering options.

    The backend method for the Dataset.iterate_items call.


    Parameters

    • optionalkeyword-onlyoffset: int = 0
    • optionalkeyword-onlylimit: int | None = None
    • optionalkeyword-onlyclean: bool = False
    • optionalkeyword-onlydesc: bool = False
    • optionalkeyword-onlyfields: list[str] | None = None
    • optionalkeyword-onlyomit: list[str] | None = None
    • optionalkeyword-onlyunwind: str | None = None
    • optionalkeyword-onlyskip_empty: bool = False
    • optionalkeyword-onlyskip_hidden: bool = False

    Returns AsyncIterator[dict[str, Any]]

open

  • Open or create a new memory dataset client.

    This method creates a new in-memory dataset instance. Unlike persistent storage implementations, memory datasets don't check for existing datasets with the same name or ID since all data exists only in memory and is lost when the process terminates.


    Parameters

    • keyword-onlyid: str | None

      The ID of the dataset. If not provided, a random ID will be generated.

    • keyword-onlyname: str | None

      The name of the dataset. If not provided, the dataset will be unnamed.

    Returns MemoryDatasetClient

purge

  • async purge(): None
  • Purge all items from the dataset.

    The backend method for the Dataset.purge call.


    Returns None

push_data

  • async push_data(data): None
  • Push data to the dataset.

    The backend method for the Dataset.push_data call.


    Parameters

    • data: list[Any] | dict[str, Any]

    Returns None