Skip to main content

Scrapy vs. Crawlee

· 11 min read
Saurav Jain
Developer Community Manager

Hey, crawling masters!

Welcome to another post on the Crawlee blog; this time, we are going to compare Scrapy, one of the oldest and most popular web scraping libraries in the world, with Crawlee, a relative newcomer. This article will answer your questions about when to use Scrapy and help you decide when it would be better to use Crawlee instead. This article will be the first in a series comparing the various technical aspects of Crawlee with Scrapy.

Introduction:

Scrapy is an open-source Python-based web scraping framework that extracts data from websites. With Scrapy, you create spiders, which are autonomous scripts to download and process web content. The limitation of Scrapy is that it does not work very well with JavaScript rendered websites, as it was designed for static HTML pages. We will do a comparison later in the article about this.

Crawlee is also an open-source library that originated as Apify SDK. Crawlee has the advantage of being the latest library in the market, so it already has many features that Scrapy lacks, like autoscaling, headless browsing, working with JavaScript rendered websites without any plugins, and many more, which we are going to explain later on.

How to scrape Amazon products

· 12 min read
Lukáš Průša
Junior Web Automation Engineer

Introduction

Amazon is one of the largest and most complex websites, which means scraping it is pretty challenging. Thankfully, the Crawlee library makes things a little easier, with utilities like JSON file outputs, automatic scaling, and request queue management.

In this guide, we'll be extracting information from Amazon product pages using the power of TypeScript in combination with the Cheerio and Crawlee libraries. We'll explore how to retrieve and extract detailed product data such as titles, prices, image URLs, and more from Amazon's vast marketplace. We'll also discuss handling potential blocking issues that may arise during the scraping process.

How to scrape Amazon using Typescript, Cheerio, and Crawlee

Launching Crawlee Blog

· 3 min read
Saurav Jain
Developer Community Manager

Hey, crawling masters!

I’m Saurav, Developer Community Manager at Apify, and I’m thrilled to announce that we’re launching the Crawlee blog today 🎉

We launched Crawlee, the successor to our Apify SDK, in August 2022 to make the best web scraping and automation library for Node.js developers who like to write code in JavaScript or TypeScript.

Since then, our dev community has grown exponentially. I’m proud to tell you that we have over 11,500 Stars on GitHub, over 6,000 community members on our Discord, and over 125,000 downloads monthly on npm. We’re now the most popular web scraping and automation library for Node.js developers 👏