Dataset <Data>
Index
Properties
client
readonlyconfig
id
log
optionalname
Methods
drop
Removes the dataset either from the Apify cloud storage or from the local directory, depending on the mode of operation.
Returns Promise<void>
export
Returns all the data from the dataset. This will iterate through the whole dataset via the
listItems()
client method, which gives you only paginated results.Parameters
options: DatasetExportOptions = {}
Returns Promise<Data[]>
exportTo
Save the entirety of the dataset's contents into one file within a key-value store.
Parameters
key: string
The name of the value to save the data in.
optionaloptions: DatasetExportToOptions
An optional options object where you can provide the dataset and target KVS name.
optionalcontentType: string
Only JSON and CSV are supported currently, defaults to JSON.
Returns Promise<Data[]>
exportToCSV
Save entire default dataset's contents into one CSV file within a key-value store.
Parameters
key: string
The name of the value to save the data in.
optionaloptions: Omit<DatasetExportToOptions, fromDataset>
An optional options object where you can provide the target KVS name.
Returns Promise<void>
exportToJSON
Save entire default dataset's contents into one JSON file within a key-value store.
Parameters
key: string
The name of the value to save the data in.
optionaloptions: Omit<DatasetExportToOptions, fromDataset>
An optional options object where you can provide the target KVS name.
Returns Promise<void>
forEach
Iterates over dataset items, yielding each in turn to an
iteratee
function. Each invocation ofiteratee
is called with two arguments:(item, index)
.If the
iteratee
function returns a Promise then it is awaited before the next call. If it throws an error, the iteration is aborted and theforEach
function throws the error.Example usage
const dataset = await Dataset.open('my-results');
await dataset.forEach(async (item, index) => {
console.log(`Item at ${index}: ${JSON.stringify(item)}`);
});Parameters
iteratee: DatasetConsumer<Data>
A function that is called for every item in the dataset.
optionaloptions: DatasetIteratorOptions = {}
All
forEach()
parameters.optionalindex: number = 0
Specifies the initial index number passed to the
iteratee
function.
Returns Promise<void>
getData
Returns DatasetContent object holding the items in the dataset based on the provided parameters.
Parameters
options: DatasetDataOptions = {}
Returns Promise<DatasetContent<Data>>
getInfo
Returns an object containing general information about the dataset.
The function returns the same object as the Apify API Client's getDataset function, which in turn calls the Get dataset API endpoint.
Example:
{
id: "WkzbQMuFYuamGv3YF",
name: "my-dataset",
userId: "wRsJZtadYvn4mBZmm",
createdAt: new Date("2015-12-12T07:34:14.202Z"),
modifiedAt: new Date("2015-12-13T08:36:13.202Z"),
accessedAt: new Date("2015-12-14T08:36:13.202Z"),
itemCount: 14,
}Returns Promise<undefined | DatasetInfo>
map
Produces a new array of values by mapping each value in list through a transformation function
iteratee()
. Each invocation ofiteratee()
is called with two arguments:(element, index)
.If
iteratee
returns aPromise
then it's awaited before a next call.Parameters
iteratee: DatasetMapper<Data, R>
optionaloptions: DatasetIteratorOptions = {}
All
map()
parameters.
Returns Promise<R[]>
pushData
Stores an object or an array of objects to the dataset. The function returns a promise that resolves when the operation finishes. It has no result, but throws on invalid args or other errors.
IMPORTANT: Make sure to use the
await
keyword when callingpushData()
, otherwise the crawler process might finish before the data is stored!The size of the data is limited by the receiving API and therefore
pushData()
will only allow objects whose JSON representation is smaller than 9MB. When an array is passed, none of the included objects may be larger than 9MB, but the array itself may be of any size.The function internally chunks the array into separate items and pushes them sequentially. The chunking process is stable (keeps order of data), but it does not provide a transaction safety mechanism. Therefore, in the event of an uploading error (after several automatic retries), the function's Promise will reject and the dataset will be left in a state where some of the items have already been saved to the dataset while other items from the source array were not. To overcome this limitation, the developer may, for example, read the last item saved in the dataset and re-attempt the save of the data from this item onwards to prevent duplicates.
Parameters
data: Data | Data[]
Object or array of objects containing data to be stored in the default dataset. The objects must be serializable to JSON and the JSON representation of each object must be smaller than 9MB.
Returns Promise<void>
reduce
Reduces a list of values down to a single value.
Memo is the initial state of the reduction, and each successive step of it should be returned by
iteratee()
. Theiteratee()
is passed three arguments: thememo
, then thevalue
andindex
of the iteration.If no
memo
is passed to the initial invocation of reduce, theiteratee()
is not invoked on the first element of the list. The first element is instead passed as the memo in the invocation of theiteratee()
on the next element in the list.If
iteratee()
returns aPromise
then it's awaited before a next call.Parameters
iteratee: DatasetReducer<T, Data>
memo: T
Initial state of the reduction.
optionaloptions: DatasetIteratorOptions = {}
All
reduce()
parameters.
Returns Promise<T>
staticexportToCSV
Save entire default dataset's contents into one CSV file within a key-value store.
Parameters
key: string
The name of the value to save the data in.
optionaloptions: DatasetExportToOptions
An optional options object where you can provide the dataset and target KVS name.
Returns Promise<void>
staticexportToJSON
Save entire default dataset's contents into one JSON file within a key-value store.
Parameters
key: string
The name of the value to save the data in.
optionaloptions: DatasetExportToOptions
An optional options object where you can provide the dataset and target KVS name.
Returns Promise<void>
staticgetData
Returns DatasetContent object holding the items in the dataset based on the provided parameters.
Parameters
options: DatasetDataOptions = {}
Returns Promise<DatasetContent<Data>>
staticopen
Opens a dataset and returns a promise resolving to an instance of the Dataset class.
Datasets are used to store structured data where each object stored has the same attributes, such as online store products or real estate offers. The actual data is stored either on the local filesystem or in the cloud.
For more details and code examples, see the Dataset class.
Parameters
optionaldatasetIdOrName: null | string
ID or name of the dataset to be opened. If
null
orundefined
, the function returns the default dataset associated with the crawler run.optionaloptions: StorageManagerOptions = {}
Storage manager options.
Returns Promise<Dataset<Data>>
The
Dataset
class represents a store for structured data where each object stored has the same attributes, such as online store products or real estate offers. You can imagine it as a table, where each object is a row and its attributes are columns. Dataset is an append-only storage - you can only add new records to it but you cannot modify or remove existing records. Typically it is used to store crawling results.Do not instantiate this class directly, use the Dataset.open function instead.
Dataset
stores its data either on local disk or in the Apify cloud, depending on whether theAPIFY_LOCAL_STORAGE_DIR
orAPIFY_TOKEN
environment variables are set.If the
APIFY_LOCAL_STORAGE_DIR
environment variable is set, the data is stored in the local directory in the following files:Note that
{DATASET_ID}
is the name or ID of the dataset. The default dataset has ID:default
, unless you override it by setting theAPIFY_DEFAULT_DATASET_ID
environment variable. Each dataset item is stored as a separate JSON file, where{INDEX}
is a zero-based index of the item in the dataset.If the
APIFY_TOKEN
environment variable is set butAPIFY_LOCAL_STORAGE_DIR
not, the data is stored in the Apify Dataset cloud storage. Note that you can force usage of the cloud storage also by passing theforceCloud
option to Dataset.open function, even if theAPIFY_LOCAL_STORAGE_DIR
variable is set.Example usage: