Skip to main content

BaseDatasetClient

crawlee.base_storage_client._base_dataset_client.BaseDatasetClient

Abstract base class for dataset resource clients.

These clients are specific to the type of resource they manage and operate under a designated storage client, like a memory storage client.

Index

Methods

delete

  • async delete(): None
  • Permanently delete the dataset managed by this client.


    Returns None

get

  • async get(): DatasetMetadata | None
  • Get metadata about the dataset being managed by this client.


    Returns DatasetMetadata | None

get_items_as_bytes

  • async get_items_as_bytes(*, item_format, offset, limit, desc, clean, bom, delimiter, fields, omit, unwind, skip_empty, skip_header_row, skip_hidden, xml_root, xml_row, flatten): bytes
  • Retrieves dataset items as bytes.


    Parameters

    • item_format: str = 'json'keyword-only
    • offset: int | None = Nonekeyword-only
    • limit: int | None = Nonekeyword-only
    • desc: bool = Falsekeyword-only
    • clean: bool = Falsekeyword-only
    • bom: bool = Falsekeyword-only
    • delimiter: str | None = Nonekeyword-only
    • fields: list[str] | None = Nonekeyword-only
    • omit: list[str] | None = Nonekeyword-only
    • unwind: str | None = Nonekeyword-only
    • skip_empty: bool = Falsekeyword-only
    • skip_header_row: bool = Falsekeyword-only
    • skip_hidden: bool = Falsekeyword-only
    • xml_root: str | None = Nonekeyword-only
    • xml_row: str | None = Nonekeyword-only
    • flatten: list[str] | None = Nonekeyword-only

    Returns bytes

iterate_items

  • async iterate_items(*, offset, limit, clean, desc, fields, omit, unwind, skip_empty, skip_hidden): AsyncIterator[dict]
  • Iterates over items in the dataset according to specified filters and sorting.

    This method allows for asynchronously iterating through dataset items while applying various filters such as skipping empty items, hiding specific fields, and sorting. It supports pagination via offset and limit parameters, and can modify the appearance of dataset items using fields, omit, unwind, skip_empty, and skip_hidden parameters.


    Parameters

    • offset: int = 0keyword-only
    • limit: int | None = Nonekeyword-only
    • clean: bool = Falsekeyword-only
    • desc: bool = Falsekeyword-only
    • fields: list[str] | None = Nonekeyword-only
    • omit: list[str] | None = Nonekeyword-only
    • unwind: str | None = Nonekeyword-only
    • skip_empty: bool = Falsekeyword-only
    • skip_hidden: bool = Falsekeyword-only

    Returns AsyncIterator[dict]

list_items

  • async list_items(*, offset, limit, clean, desc, fields, omit, unwind, skip_empty, skip_hidden, flatten, view): DatasetItemsListPage
  • Retrieves a paginated list of items from a dataset based on various filtering parameters.

    This method provides the flexibility to filter, sort, and modify the appearance of dataset items when listed. Each parameter modifies the result set according to its purpose. The method also supports pagination through 'offset' and 'limit' parameters.


    Parameters

    • offset: int | None = 0keyword-only
    • limit: int | None = _LIST_ITEMS_LIMITkeyword-only
    • clean: bool = Falsekeyword-only
    • desc: bool = Falsekeyword-only
    • fields: list[str] | None = Nonekeyword-only
    • omit: list[str] | None = Nonekeyword-only
    • unwind: str | None = Nonekeyword-only
    • skip_empty: bool = Falsekeyword-only
    • skip_hidden: bool = Falsekeyword-only
    • flatten: list[str] | None = Nonekeyword-only
    • view: str | None = Nonekeyword-only

    Returns DatasetItemsListPage

push_items

  • async push_items(items): None
  • Push items to the dataset.


    Parameters

    • items: JsonSerializable

    Returns None

stream_items

  • async stream_items(*, item_format, offset, limit, desc, clean, bom, delimiter, fields, omit, unwind, skip_empty, skip_header_row, skip_hidden, xml_root, xml_row): AsyncContextManager[Response | None]
  • Retrieves dataset items as a streaming response.


    Parameters

    • item_format: str = 'json'keyword-only
    • offset: int | None = Nonekeyword-only
    • limit: int | None = Nonekeyword-only
    • desc: bool = Falsekeyword-only
    • clean: bool = Falsekeyword-only
    • bom: bool = Falsekeyword-only
    • delimiter: str | None = Nonekeyword-only
    • fields: list[str] | None = Nonekeyword-only
    • omit: list[str] | None = Nonekeyword-only
    • unwind: str | None = Nonekeyword-only
    • skip_empty: bool = Falsekeyword-only
    • skip_header_row: bool = Falsekeyword-only
    • skip_hidden: bool = Falsekeyword-only
    • xml_root: str | None = Nonekeyword-only
    • xml_row: str | None = Nonekeyword-only

    Returns AsyncContextManager[Response | None]

update

  • async update(*, name): DatasetMetadata
  • Update the dataset metadata.


    Parameters

    • name: str | None = Nonekeyword-only

    Returns DatasetMetadata