BaseHttpClient
crawlee.http_clients._base.BaseHttpClient
Index
Constructors
Methods
Constructors
__init__
Create a new instance.
Parameters
persist_cookies_per_session: bool = Truekeyword-only
additional_http_error_status_codes: Iterable[int] = ()keyword-only
ignore_http_error_status_codes: Iterable[int] = ()keyword-only
Returns None
Methods
crawl
Perform the crawling for a given request.
This method is called from
crawler.run()
.Parameters
request: Request
session: Session | None = Nonekeyword-only
proxy_info: ProxyInfo | None = Nonekeyword-only
statistics: Statistics | None = Nonekeyword-only
Returns HttpCrawlingResult
send_request
Send an HTTP request via the client.
This method is called from
context.send_request()
helper.Parameters
url: str
method: HttpMethod = 'GET'keyword-only
headers: HttpHeaders | None = Nonekeyword-only
query_params: HttpQueryParams | None = Nonekeyword-only
data: dict[str, Any] | None = Nonekeyword-only
session: Session | None = Nonekeyword-only
proxy_info: ProxyInfo | None = Nonekeyword-only
Returns HttpResponse
An abstract base class for HTTP clients used in crawlers (
BasicCrawler
subclasses).The specific HTTP client should use
_raise_for_error_status_code
method for checking the status code. This way the consistent behaviour accross different HTTP clients can be maintained. It raises anHttpStatusCodeError
when it encounters an error response, defined by default as any HTTP status code in the range of 400 to 599. The error handling behavior is customizable, allowing the user to specify additional status codes to treat as errors or to exclude specific status codes from being considered errors. Seeadditional_http_error_status_codes
andignore_http_error_status_codes
arguments in the constructor.