Crawlee is a web scraping library for JavaScript and Python. It handles blocking, crawling, proxies, and browsers for you.
import { PlaywrightCrawler } from 'crawlee';
const crawler = new PlaywrightCrawler({
async requestHandler({ request, page, enqueueLinks, pushData, log }) {
const title = await page.title();
log.info(`Title of ${request.loadedUrl} is '${title}'`);
await pushData({ title, url: request.loadedUrl });
await enqueueLinks();
},
// Uncomment this option to see the browser window.
// headless: false,
});
await crawler.run(['https://crawlee.dev']);
$npx crawlee create my-crawler
Crawlee crawls stealthily with zero configuration, but you can customize its behavior to overcome any protection. Real-world fingerprints included.
Learn more{
fingerprintOptions: {
fingerprintGeneratorOptions: {
browsers: ['chrome', 'firefox'],
devices: ['mobile'],
locales: ['en-US'],
},
},
},
Crawlee integrates BeautifulSoup, Cheerio, Puppeteer, Playwright, and other popular open-source tools. No need to learn new syntax.
Learn moreSwitch between HTTP and headless without big rewrites thanks to a shared API. Or even let Adaptive crawler decide if JS rendering is needed.
Learn moreconst crawler = new AdaptivePlaywrightCrawler({
renderingTypeDetectionRatio: 0.1,
async requestHandler({ querySelector, enqueueLinks }) {
// The crawler detects if JS rendering is needed
// to extract this data. If not, it will use HTTP
// for follow-up requests to save time and costs.
const $prices = await querySelector('span.price')
await enqueueLinks();
},
});
Pause and resume crawlers thanks to a persistent queue of URLs and storage for structured data.
Sitemaps, infinite scroll, contact extraction, large asset blocking and many more utils included.
Keep your code clean and organized while managing complex crawls with a built-in router that streamlines the process.
Actor.init()to the begining and
Actor.exit()to the end of your code.
Copy code example, install Crawlee and go. No CLI required, no complex file structure, no boilerplate.
Get startedUnblocking, proxy rotation and other core features are already turned on. But also very configurable.
Learn moreJoin our Discord community of over 10k developers and get fast answers to your web scraping questions.
Join Discord