Skip to main content

ProxyConfiguration

crawlee.proxy_configuration.ProxyConfiguration

Configures connection to a proxy server with the provided options.

Proxy servers are used to prevent target websites from blocking your crawlers based on IP address rate limits or blacklists. Setting proxy configuration in your crawlers automatically configures them to use the selected proxies for all connections. You can get information about the currently used proxy by inspecting the ProxyInfo property in your crawler's page function. There, you can inspect the proxy's URL and other attributes.

If you want to use your own proxies, use the ProxyConfigurationOptions.proxyUrls option. Your list of proxy URLs will be rotated by the configuration if this option is provided.

Index

Constructors

__init__

  • __init__(*, proxy_urls, new_url_function, tiered_proxy_urls): None
  • Initialize a proxy configuration object.

    Exactly one of proxy_urls, tiered_proxy_urls or new_url_function must be specified.


    Parameters

    • proxy_urls: list[str] | None = Nonekeyword-only
    • new_url_function: _NewUrlFunction | None = Nonekeyword-only
    • tiered_proxy_urls: list[list[str]] | None = Nonekeyword-only

    Returns None

Methods

new_proxy_info

  • async new_proxy_info(session_id, request, proxy_tier): ProxyInfo | None
  • Return a new ProxyInfo object.

    If called repeatedly with the same request, it is assumed that the request is being retried. If a previously used session ID is received, it will return the same proxy url.


    Parameters

    • session_id: str | None
    • request: Request | None
    • proxy_tier: int | None

    Returns ProxyInfo | None

new_url

  • async new_url(session_id, request, proxy_tier): str | None
  • Return a new proxy url.

    If called repeatedly with the same request, it is assumed that the request is being retried. If a previously used session ID is received, it will return the same proxy url.


    Parameters

    • session_id: str | None = None
    • request: Request | None = None
    • proxy_tier: int | None = None

    Returns str | None