Skip to main content

HttpxHttpClient

crawlee.http_clients._httpx.HttpxHttpClient

HTTP client based on the HTTPX library.

This client uses the HTTPX library to perform HTTP requests in crawlers (BasicCrawler subclasses) and to manage sessions, proxies, and error handling.

See the BaseHttpClient class for more common information about HTTP clients.

Index

Constructors

Methods

Constructors

__init__

  • __init__(*, persist_cookies_per_session, additional_http_error_status_codes, ignore_http_error_status_codes, http1, http2, async_client_kwargs): None
  • Create a new instance.


    Parameters

    • persist_cookies_per_session: bool = Truekeyword-only
    • additional_http_error_status_codes: Iterable[int] = ()keyword-only
    • ignore_http_error_status_codes: Iterable[int] = ()keyword-only
    • http1: bool = Truekeyword-only
    • http2: bool = Truekeyword-only
    • async_client_kwargs: Any

    Returns None

Methods

crawl

  • async crawl(request, *, session, proxy_info, statistics): HttpCrawlingResult
  • Parameters

    • request: Request
    • session: Session | None = Nonekeyword-only
    • proxy_info: ProxyInfo | None = Nonekeyword-only
    • statistics: Statistics | None = Nonekeyword-only

    Returns HttpCrawlingResult

send_request

  • async send_request(url, *, method, headers, query_params, data, session, proxy_info): HttpResponse
  • Parameters

    • url: str
    • method: HttpMethod = 'GET'keyword-only
    • headers: HttpHeaders | None = Nonekeyword-only
    • query_params: dict[str, Any] | None = Nonekeyword-only
    • data: dict[str, Any] | None = Nonekeyword-only
    • session: Session | None = Nonekeyword-only
    • proxy_info: ProxyInfo | None = Nonekeyword-only

    Returns HttpResponse