Impit HTTP Client
Introduction
The ImpitHttpClient is an HTTP client implementation based on the Impit library. It enables browser impersonation for HTTP requests, helping you bypass bot detection systems without running an actual browser.
Impit is the successor to got-scraping, which is no longer actively maintained. We recommend using ImpitHttpClient for all new projects. Impit provides better anti-bot evasion through TLS fingerprinting and HTTP/3 support, while maintaining a smaller package size.
Impit will become the default HTTP client in the next major version of Crawlee.
Why use Impit?
Websites increasingly use sophisticated bot detection that analyzes:
- HTTP fingerprints: User-Agent strings, header ordering, HTTP/2 pseudo-header sequences
- TLS fingerprints: Cipher suites, TLS extensions, and cryptographic details in the ClientHello message
Standard HTTP clients like fetch or axios are easily detected because their fingerprints don't match real browsers. Unlike got-scraping which only handles HTTP-level fingerprinting, Impit also mimics TLS fingerprints, making requests appear to come from real browsers.
Installation
Install the @crawlee/impit-client package:
npm install @crawlee/impit-client
The impit package includes native binaries and supports Windows, macOS (including ARM), and Linux out of the box.
Basic usage
Pass the ImpitHttpClient instance to the httpClient option of any Crawlee crawler:
import { BasicCrawler } from 'crawlee';
import { ImpitHttpClient, Browser } from '@crawlee/impit-client';
const crawler = new BasicCrawler({
httpClient: new ImpitHttpClient({
browser: Browser.Firefox,
}),
async requestHandler({ sendRequest, log }) {
const response = await sendRequest();
log.info('Received response', { statusCode: response.statusCode });
},
});
await crawler.run(['https://example.com']);
Usage with different crawlers
CheerioCrawler
import { CheerioCrawler } from 'crawlee';
import { ImpitHttpClient, Browser } from '@crawlee/impit-client';
const crawler = new CheerioCrawler({
httpClient: new ImpitHttpClient({
browser: Browser.Chrome,
}),
async requestHandler({ $, request, enqueueLinks, pushData }) {
const title = $('title').text();
const h1 = $('h1').first().text();
await pushData({
url: request.url,
title,
h1,
});
// Enqueue links found on the page
await enqueueLinks();
},
});
await crawler.run(['https://example.com']);
HttpCrawler
import { HttpCrawler } from 'crawlee';
import { ImpitHttpClient, Browser } from '@crawlee/impit-client';
const crawler = new HttpCrawler({
httpClient: new ImpitHttpClient({
browser: Browser.Firefox,
http3: true,
}),
async requestHandler({ body, request, log, pushData }) {
log.info(`Processing ${request.url}`);
// body is the raw HTML string
await pushData({
url: request.url,
bodyLength: body.length,
});
},
});
await crawler.run(['https://example.com']);
Configuration options
The ImpitHttpClient constructor accepts the following options:
| Option | Type | Default | Description |
|---|---|---|---|
browser | 'chrome' | 'firefox' | undefined | Browser to impersonate. Affects TLS fingerprint and default headers. |
http3 | boolean | false | Enable HTTP/3 (QUIC) protocol support. |
ignoreTlsErrors | boolean | false | Ignore TLS certificate errors. Useful for testing or self-signed certificates. |
Browser impersonation
Use the Browser enum to specify which browser to impersonate:
import { ImpitHttpClient, Browser } from '@crawlee/impit-client';
// Impersonate Firefox
const firefoxClient = new ImpitHttpClient({ browser: Browser.Firefox });
// Impersonate Chrome
const chromeClient = new ImpitHttpClient({ browser: Browser.Chrome });
Advanced configuration
import { CheerioCrawler } from 'crawlee';
import { ImpitHttpClient, Browser } from '@crawlee/impit-client';
const crawler = new CheerioCrawler({
httpClient: new ImpitHttpClient({
// Impersonate Chrome browser
browser: Browser.Chrome,
// Enable HTTP/3 protocol
http3: true,
}),
async requestHandler({ $ }) {
console.log(`Title: ${$('title').text()}`);
},
});
await crawler.run(['https://example.com']);
Proxy support
Proxies are configured per-request through Crawlee's proxy management system, not on the ImpitHttpClient itself. Use ProxyConfiguration as you normally would:
import { CheerioCrawler, ProxyConfiguration } from 'crawlee';
import { ImpitHttpClient, Browser } from '@crawlee/impit-client';
const proxyConfiguration = new ProxyConfiguration({
proxyUrls: ['http://proxy1.example.com:8080', 'http://proxy2.example.com:8080'],
});
const crawler = new CheerioCrawler({
httpClient: new ImpitHttpClient({ browser: Browser.Chrome }),
proxyConfiguration,
async requestHandler({ $, request }) {
console.log(`Scraped ${request.url}`);
},
});
How it works
Impit achieves browser impersonation at two levels:
-
HTTP level: Mimics browser-specific header ordering, HTTP/2 settings, and pseudo-header sequences that antibot services analyze.
-
TLS level: Uses a patched version of
rustlsto replicate the exact TLS ClientHello message that browsers send, including cipher suites and extensions.
This dual-layer approach makes requests appear to come from a real browser, significantly reducing blocks from bot detection systems.
Comparison with other solutions
| Feature | got-scraping | curl-impersonate | Impit |
|---|---|---|---|
| TLS fingerprinting | No | Yes | Yes |
| HTTP/3 support | No | Yes | Yes |
| Native Node.js package | Yes | No (child process) | Yes |
| Windows/macOS ARM | Yes | No | Yes |
| Package size | ~10 MB | ~20 MB | ~8 MB |
Related links