Skip to main content

EnqueueLinksFunction

A function for enqueueing new URLs to crawl based on elements selected by a given selector.

It extracts URLs from the current page and enqueues them for further crawling. It allows filtering through selectors and other options. You can also specify labels and user data to be associated with the newly created Request objects.

Index

Methods

Methods

__call__

  • __call__(*, selector, label, user_data, kwargs): Coroutine[None, None, None]
  • A call dunder method.


    Parameters

    • optionalkeyword-onlyselector: str = 'a'

      A selector used to find the elements containing the links. The behaviour differs based on the crawler used:

      • PlaywrightCrawler supports CSS and XPath selectors.
      • ParselCrawler supports CSS selectors.
      • BeautifulSoupCrawler supports CSS selectors.
    • optionalkeyword-onlylabel: str | None = None

      Label for the newly created Request objects, used for request routing.

    • optionalkeyword-onlyuser_data: dict[str, Any] | None = None

      User data to be provided to the newly created Request objects.

    • keyword-onlylimit: int

      Maximum number of requests to be enqueued.

    • keyword-onlybase_url: str

      Base URL to be used for relative URLs.

    • keyword-onlystrategy: EnqueueStrategy

      Enqueueing strategy, see the EnqueueStrategy enum for possible values and their meanings.

    • keyword-onlyinclude: list[re.Pattern | Glob]

      List of regular expressions or globs that URLs must match to be enqueued.

    • keyword-onlyexclude: list[re.Pattern | Glob]

      List of regular expressions or globs that URLs must not match to be enqueued.

    Returns Coroutine[None, None, None]

Page Options