Skip to main content

CurlImpersonateHttpClient

crawlee.http_clients.curl_impersonate.CurlImpersonateHttpClient

HTTP client based on the curl-cffi library.

This client uses the curl-cffi library to perform HTTP requests in crawlers (BasicCrawler subclasses) and to manage sessions, proxies, and error handling.

See the BaseHttpClient class for more common information about HTTP clients.

Index

Constructors

Methods

Constructors

__init__

  • __init__(*, persist_cookies_per_session, additional_http_error_status_codes, ignore_http_error_status_codes, async_session_kwargs): None
  • Create a new instance.


    Parameters

    • persist_cookies_per_session: bool = Truekeyword-only
    • additional_http_error_status_codes: Iterable[int] = ()keyword-only
    • ignore_http_error_status_codes: Iterable[int] = ()keyword-only
    • async_session_kwargs: Any

    Returns None

Methods

crawl

  • async crawl(request, *, session, proxy_info, statistics): HttpCrawlingResult
  • Parameters

    • request: Request
    • session: Session | None = Nonekeyword-only
    • proxy_info: ProxyInfo | None = Nonekeyword-only
    • statistics: Statistics | None = Nonekeyword-only

    Returns HttpCrawlingResult

send_request

  • async send_request(url, *, method, headers, query_params, data, session, proxy_info): HttpResponse
  • Parameters

    • url: str
    • method: HttpMethod = 'GET'keyword-only
    • headers: HttpHeaders | None = Nonekeyword-only
    • query_params: dict[str, Any] | None = Nonekeyword-only
    • data: dict[str, Any] | None = Nonekeyword-only
    • session: Session | None = Nonekeyword-only
    • proxy_info: ProxyInfo | None = Nonekeyword-only

    Returns HttpResponse