How to build a price tracker with Crawlee and Apify
Build a price tracker with Crawlee for Python to scrape product details, export data in multiple formats, and send email alerts for price drops, then deploy and schedule it as an Apify Actor.
Build a price tracker with Crawlee for Python to scrape product details, export data in multiple formats, and send email alerts for price drops, then deploy and schedule it as an Apify Actor.
Bluesky is an emerging social network developed by former members of the Twitter(now X) development team. The platform has been showing significant growth recently, reaching 140.3 million visits according to SimilarWeb. Like X, luesky generates a vast amount of data that can be used for analysis. In this article, we’ll explore how to collect this data using Crawlee for Python.
One of our community members wrote this blog as a contribution to the Crawlee Blog. If you’d like to contribute articles like these, please reach out to us on our discord channel.
Key steps we will cover:
Crawlee for Python v0.6 is here, and it's packed with new features and important bug fixes. If you're upgrading from a previous version, please take a moment to review the breaking changes detailed below to ensure a smooth transition.
SuperScraper is an open-source Actor that combines features from various web scraping services, including ScrapingBee, ScrapingAnt, and ScraperAPI.
A key capability is its standby mode, which runs the Actor as a persistent API server. This removes the usual start-up times - a common pain point in many systems - and lets users make direct API calls to interact with the system immediately.
This blog explains how SuperScraper works, highlights its implementation details, and provides code snippets to demonstrate its core functionality.
Crawlee for Python v0.5 is now available! This is our biggest release to date, bringing new ported functionality from the Crawlee for JavaScript, brand-new features that are exclusive to the Python library (for now), a new consolidated package structure, and a bunch of bug fixes and further improvements.
Python developers know the drill: you need reliable company data, and Crunchbase has it. This guide shows you how to build an effective Crunchbase scraper in Python that gets you the data you need.
Crunchbase tracks details that matter: locations, business focus, founders, and investment histories. Manual extraction from such a large dataset isn't practical -automation is essential for transforming this information into an analyzable format.
By the end of this blog, we'll explore three different ways to extract data from Crunchbase using Crawlee for Python
. We'll fully implement two of them and discuss the specifics and challenges of the third. This will help us better understand how important it is to properly choose the right data source.
This guide comes from a developer in our growing community. Have you built interesting projects with Crawlee? Join us on Discord to share your experiences and blog ideas - we value these contributions from developers like you.
Key steps we'll cover:
Millions of people use Google Maps daily, leaving behind a goldmine of data just waiting to be analyzed. In this guide, I'll show you how to build a reliable scraper using Crawlee and Python to extract locations, ratings, and reviews from Google Maps, all while handling its dynamic content challenges.
One of our community members wrote this blog as a contribution to the Crawlee Blog. If you would like to contribute blogs like these to Crawlee Blog, please reach out to us on our discord channel.
We’ll collect information about hotels in a specific city. You can also customize your search to meet your requirements. For example, you might search for "hotels near me", "5-star hotels in Bombay", or other similar queries.
We’ll extract important data, including the hotel name, rating, review count, price, a link to the hotel page on Google Maps, and all available amenities. Here’s an example of what the extracted data will look like:
{
"name": "Vividus Hotels, Bangalore",
"rating": "4.3",
"reviews": "633",
"price": "₹3,667",
"amenities": [
"Pool available",
"Free breakfast available",
"Free Wi-Fi available",
"Free parking available"
],
"link": "https://www.google.com/maps/place/Vividus+Hotels+,+Bangalore/..."
}
Scraping Google Search
delivers essential SERP analysis
, SEO optimization, and data collection capabilities. Modern scraping tools make this process faster and more reliable.
One of our community members wrote this blog as a contribution to the Crawlee Blog. If you would like to contribute blogs like these to Crawlee Blog, please reach out to us on our discord channel.
In this guide, we'll create a Google Search scraper using Crawlee for Python
that can handle result ranking and pagination.
We'll create a scraper that:
GraphQL is a query language for getting deeply nested structured data from a website's backend, similar to MongoDB queries.
The request is usually a POST to some general /graphql
endpoint with a body like this:
When scraping data from websites using GraphQL, it’s common to inspect the network requests in developer tools to find the exact queries being used. However, on some websites, you might notice that the GraphQL query itself isn’t visible in the request. Instead, you only see a cryptic hash value. This can be confusing and makes it harder to understand how data is being requested from the server.
This is because some websites use a feature called "persisted queries. It's a performance optimization that reduces the amount of data sent with each request by replacing the full query text with a precomputed hash. While this improves website speed and efficiency, it introduces challenges for scraping because the query text isn’t readily available.
Typically, tutorials focus on the technical aspects, on what you can replicate: "Start here, follow this path, and you'll end up here." This is great for learning a particular technology, but it's sometimes difficult to understand why the author decided to do things a certain way or what guides their development process.
One of our community members wrote this blog as a contribution to Crawlee Blog. If you want to contribute blogs like these to Crawlee Blog, please reach out to us on our discord channel.
In this blog, I'll discuss the general rules and principles that guide me when I work on web scraping projects and allow me to achieve great results.
So, let's explore the mindset of a web scraping developer.