Skip to main content

Add data to dataset

This example demonstrates how to store extracted data into datasets using the context.push_data() helper function. If the specified dataset does not already exist, it will be created automatically. Additionally, you can save data to custom datasets by providing dataset_id or dataset_name parameters to the push_data method.

import asyncio

from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext

async def main() -> None:
crawler = BeautifulSoupCrawler()

# Define the default request handler, which will be called for every request.
async def request_handler(context: BeautifulSoupCrawlingContext) -> None:'Processing {context.request.url} ...')

# Extract data from the page.
data = {
'url': context.request.url,
'title': context.soup.title.string,
'html': str(context.soup)[:1000],

# Push the extracted data to the default dataset.
await context.push_data(data)

# Run the crawler with the initial list of requests.

if __name__ == '__main__':

Each item in the dataset will be stored in its own file within the following directory:


For more control, you can also open a dataset manually using the asynchronous constructor and interact with it directly:

from crawlee.storages import Dataset

# ...

async def main() -> None:
# Open dataset manually using asynchronous constructor open().
dataset = await

# Interact with dataset directly.
await dataset.push_data({'key': 'value'})

# ...