PlaywrightHttpClient

HTTP client based on the Playwright library.

This client uses the Playwright library to perform HTTP requests in crawlers (BasicCrawler subclasses) and to manage sessions, proxies, and error handling.

See the HttpClient class for more common information about HTTP clients.

Note: This class is pre-designated for use in PlaywrightCrawler only

Hierarchy

HttpClient
- PlaywrightHttpClient

Index

Methods

Properties

active

Methods

aenter

async __aenter__(): HttpClient

Inherited from HttpClient.__aenter__
Initialize the client when entering the context manager.
Returns HttpClient

aexit

async __aexit__(exc_type, exc_value, traceback): None

Inherited from HttpClient.__aexit__
Deinitialize the client and clean up resources when exiting the context manager.
Parameters
- exc_type: BaseException | None
- exc_value: BaseException | None
- traceback: TracebackType | None
Returns None

init

__init__(): None

Overrides HttpClient.__init__
Initialize a new instance.
Returns None

cleanup

async cleanup(): None

Overrides HttpClient.cleanup
Clean up resources used by the client.

This method is called when the client is no longer needed and should be overridden in subclasses to perform any necessary cleanup such as closing connections, releasing file handles, or other resource deallocation.
Returns None

crawl

async crawl(request, *, session, proxy_info, statistics, timeout): HttpCrawlingResult

Overrides HttpClient.crawl
Perform the crawling for a given request.

This method is called from crawler.run().
Parameters
- request: Request
  The request to be crawled.
- optionalkeyword-onlysession: Session | None = None
  The session associated with the request.
- optionalkeyword-onlyproxy_info: ProxyInfo | None = None
  The information about the proxy to be used.
- optionalkeyword-onlystatistics: Statistics | None = None
  The statistics object to register status codes.
- optionalkeyword-onlytimeout: timedelta | None = None
  Maximum time allowed to process the request.
Returns HttpCrawlingResult
The result of the crawling.

send_request

async send_request(url, *, method, headers, payload, session, proxy_info, timeout): HttpResponse

Overrides HttpClient.send_request
Send an HTTP request via the client.

This method is called from context.send_request() helper.
Parameters
- url: str
  The URL to send the request to.
- optionalkeyword-onlymethod: HttpMethod = 'GET'
  The HTTP method to use.
- optionalkeyword-onlyheaders: (HttpHeaders | dict[str, str]) | None = None
  The headers to include in the request.
- optionalkeyword-onlypayload: HttpPayload | None = None
  The data to be sent as the request body.
- optionalkeyword-onlysession: Session | None = None
  The session associated with the request.
- optionalkeyword-onlyproxy_info: ProxyInfo | None = None
  The information about the proxy to be used.
- optionalkeyword-onlytimeout: timedelta | None = None
  Maximum time allowed to process the request.
Returns HttpResponse
The HTTP response received from the server.

stream

stream(url, *, method, headers, payload, session, proxy_info, timeout): AbstractAsyncContextManager[HttpResponse]

Overrides HttpClient.stream
Stream an HTTP request via the client.

This method should be used for downloading potentially large data where you need to process the response body in chunks rather than loading it entirely into memory.
Parameters
- url: str
  The URL to send the request to.
- optionalkeyword-onlymethod: HttpMethod = 'GET'
  The HTTP method to use.
- optionalkeyword-onlyheaders: (HttpHeaders | dict[str, str]) | None = None
  The headers to include in the request.
- optionalkeyword-onlypayload: HttpPayload | None = None
  The data to be sent as the request body.
- optionalkeyword-onlysession: Session | None = None
  The session associated with the request.
- optionalkeyword-onlyproxy_info: ProxyInfo | None = None
  The information about the proxy to be used.
- optionalkeyword-onlytimeout: timedelta | None = None
  The maximum time to wait for establishing the connection.
Returns AbstractAsyncContextManager[HttpResponse]
An async context manager yielding the HTTP response with streaming capabilities.

Properties

active

active: bool

Indicate whether the context is active.

Hierarchy

Index

Methods

Properties

Methods

__aenter__

Returns HttpClient

__aexit__

Parameters

exc_type: BaseException | None

exc_value: BaseException | None

traceback: TracebackType | None

Returns None

__init__

Returns None

cleanup

Returns None

crawl

Parameters

request: Request

optionalkeyword-onlysession: Session | None = None

optionalkeyword-onlyproxy_info: ProxyInfo | None = None

optionalkeyword-onlystatistics: Statistics | None = None

optionalkeyword-onlytimeout: timedelta | None = None

Returns HttpCrawlingResult

send_request

Parameters

url: str

optionalkeyword-onlymethod: HttpMethod = 'GET'

optionalkeyword-onlyheaders: (HttpHeaders | dict[str, str]) | None = None

optionalkeyword-onlypayload: HttpPayload | None = None

optionalkeyword-onlysession: Session | None = None

optionalkeyword-onlyproxy_info: ProxyInfo | None = None

optionalkeyword-onlytimeout: timedelta | None = None

Returns HttpResponse

stream

Parameters

url: str

optionalkeyword-onlymethod: HttpMethod = 'GET'

optionalkeyword-onlyheaders: (HttpHeaders | dict[str, str]) | None = None

optionalkeyword-onlypayload: HttpPayload | None = None

optionalkeyword-onlysession: Session | None = None

optionalkeyword-onlyproxy_info: ProxyInfo | None = None

optionalkeyword-onlytimeout: timedelta | None = None

Returns AbstractAsyncContextManager[HttpResponse]

Properties

active

aenter

aexit

init