Skip to main content

DatasetClient

crawlee.memory_storage_client.dataset_client.DatasetClient

Subclient for manipulating a single dataset.

Index

Constructors

__init__

  • __init__(*, memory_storage_client, id, name, created_at, accessed_at, modified_at, item_count): None
  • Parameters

    • memory_storage_client: MemoryStorageClientkeyword-only
    • id: str | None = Nonekeyword-only
    • name: str | None = Nonekeyword-only
    • created_at: datetime | None = Nonekeyword-only
    • accessed_at: datetime | None = Nonekeyword-only
    • modified_at: datetime | None = Nonekeyword-only
    • item_count: int = 0keyword-only

    Returns None

Methods

delete

  • async delete(): None
  • Returns None

get

  • async get(): DatasetMetadata | None
  • Returns DatasetMetadata | None

get_items_as_bytes

  • async get_items_as_bytes(*, item_format, offset, limit, desc, clean, bom, delimiter, fields, omit, unwind, skip_empty, skip_header_row, skip_hidden, xml_root, xml_row, flatten): bytes
  • Parameters

    • item_format: str = 'json'keyword-only
    • offset: int | None = Nonekeyword-only
    • limit: int | None = Nonekeyword-only
    • desc: bool = Falsekeyword-only
    • clean: bool = Falsekeyword-only
    • bom: bool = Falsekeyword-only
    • delimiter: str | None = Nonekeyword-only
    • fields: list[str] | None = Nonekeyword-only
    • omit: list[str] | None = Nonekeyword-only
    • unwind: str | None = Nonekeyword-only
    • skip_empty: bool = Falsekeyword-only
    • skip_header_row: bool = Falsekeyword-only
    • skip_hidden: bool = Falsekeyword-only
    • xml_root: str | None = Nonekeyword-only
    • xml_row: str | None = Nonekeyword-only
    • flatten: list[str] | None = Nonekeyword-only

    Returns bytes

get_start_and_end_indexes

  • get_start_and_end_indexes(offset, limit): tuple[int, int]
  • Calculate the start and end indexes for listing items.


    Parameters

    • offset: int
    • limit: int | None = None

    Returns tuple[int, int]

iterate_items

  • async iterate_items(*, offset, limit, clean, desc, fields, omit, unwind, skip_empty, skip_hidden): AsyncIterator[dict]
  • Parameters

    • offset: int = 0keyword-only
    • limit: int | None = Nonekeyword-only
    • clean: bool = Falsekeyword-only
    • desc: bool = Falsekeyword-only
    • fields: list[str] | None = Nonekeyword-only
    • omit: list[str] | None = Nonekeyword-only
    • unwind: str | None = Nonekeyword-only
    • skip_empty: bool = Falsekeyword-only
    • skip_hidden: bool = Falsekeyword-only

    Returns AsyncIterator[dict]

list_items

  • async list_items(*, offset, limit, clean, desc, fields, omit, unwind, skip_empty, skip_hidden, flatten, view): DatasetItemsListPage
  • Parameters

    • offset: int | None = 0keyword-only
    • limit: int | None = _LIST_ITEMS_LIMITkeyword-only
    • clean: bool = Falsekeyword-only
    • desc: bool = Falsekeyword-only
    • fields: list[str] | None = Nonekeyword-only
    • omit: list[str] | None = Nonekeyword-only
    • unwind: str | None = Nonekeyword-only
    • skip_empty: bool = Falsekeyword-only
    • skip_hidden: bool = Falsekeyword-only
    • flatten: list[str] | None = Nonekeyword-only
    • view: str | None = Nonekeyword-only

    Returns DatasetItemsListPage

push_items

  • async push_items(items): None
  • Parameters

    • items: JSONSerializable

    Returns None

stream_items

  • async stream_items(*, item_format, offset, limit, desc, clean, bom, delimiter, fields, omit, unwind, skip_empty, skip_header_row, skip_hidden, xml_root, xml_row): AsyncContextManager[Response | None]
  • Parameters

    • item_format: str = 'json'keyword-only
    • offset: int | None = Nonekeyword-only
    • limit: int | None = Nonekeyword-only
    • desc: bool = Falsekeyword-only
    • clean: bool = Falsekeyword-only
    • bom: bool = Falsekeyword-only
    • delimiter: str | None = Nonekeyword-only
    • fields: list[str] | None = Nonekeyword-only
    • omit: list[str] | None = Nonekeyword-only
    • unwind: str | None = Nonekeyword-only
    • skip_empty: bool = Falsekeyword-only
    • skip_header_row: bool = Falsekeyword-only
    • skip_hidden: bool = Falsekeyword-only
    • xml_root: str | None = Nonekeyword-only
    • xml_row: str | None = Nonekeyword-only

    Returns AsyncContextManager[Response | None]

update

  • async update(*, name): DatasetMetadata
  • Parameters

    • name: str | None = Nonekeyword-only

    Returns DatasetMetadata

update_timestamps

  • async update_timestamps(*, has_been_modified): None
  • Update the timestamps of the dataset.


    Parameters

    • has_been_modified: boolkeyword-only

    Returns None

Properties

resource_directory

resource_directory: str

Get the resource directory for the client.

resource_info

resource_info: DatasetMetadata

Get the resource info for the dataset client.