ProxyConfiguration
Index
Methods
__init__
A default constructor.
Exactly one of
proxy_urls
,tiered_proxy_urls
ornew_url_function
must be specified.Parameters
optionalkeyword-onlyproxy_urls: list[str | None] | None = None
A list of URLs of proxies that will be rotated in a round-robin fashion
optionalkeyword-onlynew_url_function: _NewUrlFunction | None = None
A function that returns a proxy URL for a given Request. This provides full control over the proxy selection mechanism.
optionalkeyword-onlytiered_proxy_urls: list[list[str | None]] | None = None
A list of URL tiers (where a tier is a list of proxy URLs). Crawlers will automatically try to use the lowest tier (smallest index) where blocking does not happen. The proxy URLs in the selected tier will be rotated in a round-robin fashion.
Returns None
new_proxy_info
Returns a new ProxyInfo object based on the configured proxy rotation strategy.
Parameters
session_id: str | None
Session identifier. If provided, same proxy URL will be returned for subsequent calls with this ID. Will be auto-generated for tiered proxies if not provided.
request: Request | None
Request object used for proxy rotation and tier selection. Required for tiered proxies to track retries and adjust tier accordingly.
proxy_tier: int | None
Specific proxy tier to use. If not provided, will be automatically selected based on configuration.
Returns ProxyInfo | None
new_url
Returns a proxy URL string based on the configured proxy rotation strategy.
Parameters
optionalsession_id: str | None = None
Session identifier. If provided, same proxy URL will be returned for subsequent calls with this ID. Will be auto-generated for tiered proxies if not provided.
optionalrequest: Request | None = None
Request object used for proxy rotation and tier selection. Required for tiered proxies to track retries and adjust tier accordingly.
optionalproxy_tier: int | None = None
Specific proxy tier to use. If not provided, will be automatically selected based on configuration.
Returns str | None
Configures connection to a proxy server with the provided options.
Proxy servers are used to prevent target websites from blocking your crawlers based on IP address rate limits or blacklists. Setting proxy configuration in your crawlers automatically configures them to use the selected proxies for all connections. You can get information about the currently used proxy by inspecting the ProxyInfo property in your crawler's page function. There, you can inspect the proxy's URL and other attributes.
If you want to use your own proxies, use the ProxyConfigurationOptions.proxyUrls option. Your list of proxy URLs will be rotated by the configuration if this option is provided.