Skip to main content

HttpClient

An abstract base class for HTTP clients used in crawlers (BasicCrawler subclasses).

Hierarchy

Index

Methods

__init__

  • __init__(*, persist_cookies_per_session): None
  • Initialize a new instance.


    Parameters

    • optionalkeyword-onlypersist_cookies_per_session: bool = True

      Whether to persist cookies per HTTP session.

    Returns None

crawl

  • Perform the crawling for a given request.

    This method is called from crawler.run().


    Parameters

    • request: Request

      The request to be crawled.

    • optionalkeyword-onlysession: Session | None = None

      The session associated with the request.

    • optionalkeyword-onlyproxy_info: ProxyInfo | None = None

      The information about the proxy to be used.

    • optionalkeyword-onlystatistics: Statistics | None = None

      The statistics object to register status codes.

    Returns HttpCrawlingResult

send_request

  • async send_request(url, *, method, headers, payload, session, proxy_info): HttpResponse
  • Send an HTTP request via the client.

    This method is called from context.send_request() helper.


    Parameters

    • url: str

      The URL to send the request to.

    • optionalkeyword-onlymethod: Literal[GET, HEAD, POST, PUT, DELETE, CONNECT, OPTIONS, TRACE, PATCH] = 'GET'

      The HTTP method to use.

    • optionalkeyword-onlyheaders: (HttpHeaders | dict[str, str]) | None = None

      The headers to include in the request.

    • optionalkeyword-onlypayload: bytes | None = None

      The data to be sent as the request body.

    • optionalkeyword-onlysession: Session | None = None

      The session associated with the request.

    • optionalkeyword-onlyproxy_info: ProxyInfo | None = None

      The information about the proxy to be used.

    Returns HttpResponse

Page Options