BaseDatasetClient
crawlee.base_storage_client._base_dataset_client.BaseDatasetClient
Index
Methods
delete
Permanently delete the dataset managed by this client.
Returns None
get
Get metadata about the dataset being managed by this client.
Returns DatasetMetadata | None
get_items_as_bytes
Retrieves dataset items as bytes.
Parameters
item_format: str = 'json'keyword-only
offset: int | None = Nonekeyword-only
limit: int | None = Nonekeyword-only
desc: bool = Falsekeyword-only
clean: bool = Falsekeyword-only
bom: bool = Falsekeyword-only
delimiter: str | None = Nonekeyword-only
fields: list[str] | None = Nonekeyword-only
omit: list[str] | None = Nonekeyword-only
unwind: str | None = Nonekeyword-only
skip_empty: bool = Falsekeyword-only
skip_header_row: bool = Falsekeyword-only
skip_hidden: bool = Falsekeyword-only
xml_root: str | None = Nonekeyword-only
xml_row: str | None = Nonekeyword-only
flatten: list[str] | None = Nonekeyword-only
Returns bytes
iterate_items
Iterates over items in the dataset according to specified filters and sorting.
This method allows for asynchronously iterating through dataset items while applying various filters such as skipping empty items, hiding specific fields, and sorting. It supports pagination via
offset
andlimit
parameters, and can modify the appearance of dataset items usingfields
,omit
,unwind
,skip_empty
, andskip_hidden
parameters.Parameters
offset: int = 0keyword-only
limit: int | None = Nonekeyword-only
clean: bool = Falsekeyword-only
desc: bool = Falsekeyword-only
fields: list[str] | None = Nonekeyword-only
omit: list[str] | None = Nonekeyword-only
unwind: str | None = Nonekeyword-only
skip_empty: bool = Falsekeyword-only
skip_hidden: bool = Falsekeyword-only
Returns AsyncIterator[dict]
list_items
Retrieves a paginated list of items from a dataset based on various filtering parameters.
This method provides the flexibility to filter, sort, and modify the appearance of dataset items when listed. Each parameter modifies the result set according to its purpose. The method also supports pagination through 'offset' and 'limit' parameters.
Parameters
offset: int | None = 0keyword-only
limit: int | None = _LIST_ITEMS_LIMITkeyword-only
clean: bool = Falsekeyword-only
desc: bool = Falsekeyword-only
fields: list[str] | None = Nonekeyword-only
omit: list[str] | None = Nonekeyword-only
unwind: str | None = Nonekeyword-only
skip_empty: bool = Falsekeyword-only
skip_hidden: bool = Falsekeyword-only
flatten: list[str] | None = Nonekeyword-only
view: str | None = Nonekeyword-only
Returns DatasetItemsListPage
push_items
Push items to the dataset.
Parameters
items: JsonSerializable
Returns None
stream_items
Retrieves dataset items as a streaming response.
Parameters
item_format: str = 'json'keyword-only
offset: int | None = Nonekeyword-only
limit: int | None = Nonekeyword-only
desc: bool = Falsekeyword-only
clean: bool = Falsekeyword-only
bom: bool = Falsekeyword-only
delimiter: str | None = Nonekeyword-only
fields: list[str] | None = Nonekeyword-only
omit: list[str] | None = Nonekeyword-only
unwind: str | None = Nonekeyword-only
skip_empty: bool = Falsekeyword-only
skip_header_row: bool = Falsekeyword-only
skip_hidden: bool = Falsekeyword-only
xml_root: str | None = Nonekeyword-only
xml_row: str | None = Nonekeyword-only
Returns AsyncContextManager[Response | None]
update
Update the dataset metadata.
Parameters
name: str | None = Nonekeyword-only
Returns DatasetMetadata
Abstract base class for dataset resource clients.
These clients are specific to the type of resource they manage and operate under a designated storage client, like a memory storage client.