AddRequestsKwargs
Hierarchy
- EnqueueLinksKwargs
- AddRequestsKwargs
Index
Properties
base_url
Base URL to be used for relative URLs.
exclude
List of regular expressions or globs that URLs must not match to be enqueued.
include
List of regular expressions or globs that URLs must match to be enqueued.
limit
Maximum number of requests to be enqueued.
requests
Requests to be added to the RequestManager
.
rq_alias
Alias of the RequestQueue
to add the requests to. Only one of rq_id
, rq_name
or rq_alias
can be provided.
rq_id
ID of the RequestQueue
to add the requests to. Only one of rq_id
, rq_name
or rq_alias
can be provided.
rq_name
Name of the RequestQueue
to add the requests to. Only one of rq_id
, rq_name
or rq_alias
can be provided.
strategy
Enqueue strategy to be used for determining which links to extract and enqueue.
Options: all: Enqueue every link encountered, regardless of the target domain. Use this option to ensure that all links, including those leading to external websites, are followed. same-domain: Enqueue links that share the same domain name as the current page, including any subdomains. This strategy is ideal for crawling within the same top-level domain while still allowing for subdomain exploration. same-hostname: Enqueue links only if they match the exact hostname of the current page. This is the default behavior and restricts the crawl to the current hostname, excluding subdomains. same-origin: Enqueue links that share the same origin as the current page. The origin is defined by the combination of protocol, domain, and port, ensuring a strict scope for the crawl.
Keyword arguments for the
add_requests
methods.