EnqueueLinksFunction

A function for enqueueing new URLs to crawl based on elements selected by a given selector or explicit requests.

It adds explicitly passed requests to the RequestManager or it extracts URLs from the current page and enqueues them for further crawling. It allows filtering through selectors and other options. You can also specify labels and user data to be associated with the newly created Request objects.

It should not be called with selector, label, user_data or transform_request_function arguments together with requests argument.

For even more control over the enqueued links you can use combination of ExtractLinksFunction and AddRequestsFunction.

Index

Methods

__call__

Methods

call

__call__(*: , selector?: str | None, label?: str | None, user_data?: dict[str, Any] | None, transform_request_function?: Callable[[RequestOptions], RequestOptions | RequestTransformAction] | None, requests?: Sequence[str | Request] | None, limit: NotRequired[int], base_url: NotRequired[str], strategy: NotRequired[EnqueueStrategy], include: NotRequired[list[re.Pattern | Glob]], exclude: NotRequired[list[re.Pattern | Glob]]): Coroutine[None, None, None]
__call__(*: , selector?: str | None, label?: str | None, user_data?: dict[str, Any] | None, transform_request_function?: Callable[[RequestOptions], RequestOptions | RequestTransformAction] | None, limit: NotRequired[int], base_url: NotRequired[str], strategy: NotRequired[EnqueueStrategy], include: NotRequired[list[re.Pattern | Glob]], exclude: NotRequired[list[re.Pattern | Glob]]): Coroutine[None, None, None]
__call__(*: , requests?: Sequence[str | Request] | None, limit: NotRequired[int], base_url: NotRequired[str], strategy: NotRequired[EnqueueStrategy], include: NotRequired[list[re.Pattern | Glob]], exclude: NotRequired[list[re.Pattern | Glob]]): Coroutine[None, None, None]

Call enqueue links function.
Parameters
- optionalkeyword-onlyselector: str | None = None
  A selector used to find the elements containing the links. The behaviour differs based on the crawler used:
  PlaywrightCrawler supports CSS and XPath selectors.
  ParselCrawler supports CSS selectors.
  BeautifulSoupCrawler supports CSS selectors.
- optionalkeyword-onlylabel: str | None = None
  Label for the newly created Request objects, used for request routing.
- optionalkeyword-onlyuser_data: dict[str, Any] | None = None
  User data to be provided to the newly created Request objects.
- optionalkeyword-onlytransform_request_function: Callable[[RequestOptions], RequestOptions | RequestTransformAction] | None = None
  A function that takes RequestOptions and returns either:
  Modified RequestOptions to update the request configuration,
  'skip' to exclude the request from being enqueued,
  'unchanged' to use the original request options without modification.
- optionalkeyword-onlyrequests: Sequence[str | Request] | None = None
  Requests to be added to the RequestManager.
- keyword-onlyoptionallimit: int
  Maximum number of requests to be enqueued.
- keyword-onlyoptionalbase_url: str
  Base URL to be used for relative URLs.
- keyword-onlyoptionalstrategy: EnqueueStrategy
  Enqueue strategy to be used for determining which links to extract and enqueue.
  
  Options: all: Enqueue every link encountered, regardless of the target domain. Use this option to ensure that all links, including those leading to external websites, are followed. same-domain: Enqueue links that share the same domain name as the current page, including any subdomains. This strategy is ideal for crawling within the same top-level domain while still allowing for subdomain exploration. same-hostname: Enqueue links only if they match the exact hostname of the current page. This is the default behavior and restricts the crawl to the current hostname, excluding subdomains. same-origin: Enqueue links that share the same origin as the current page. The origin is defined by the combination of protocol, domain, and port, ensuring a strict scope for the crawl.
- keyword-onlyoptionalinclude: list[re.Pattern | Glob]
  List of regular expressions or globs that URLs must match to be enqueued.
- keyword-onlyoptionalexclude: list[re.Pattern | Glob]
  List of regular expressions or globs that URLs must not match to be enqueued.
Returns Coroutine[None, None, None]

Index

Methods

Methods

__call__

Parameters

optionalkeyword-onlyselector: str | None = None

optionalkeyword-onlylabel: str | None = None

optionalkeyword-onlyuser_data: dict[str, Any] | None = None

optionalkeyword-onlytransform_request_function: Callable[[RequestOptions], RequestOptions | RequestTransformAction] | None = None

optionalkeyword-onlyrequests: Sequence[str | Request] | None = None

keyword-onlyoptionallimit: int

keyword-onlyoptionalbase_url: str

keyword-onlyoptionalstrategy: EnqueueStrategy

keyword-onlyoptionalinclude: list[re.Pattern | Glob]

keyword-onlyoptionalexclude: list[re.Pattern | Glob]

Returns Coroutine[None, None, None]

call