Skip to main content
Version: Next

SitemapRequestList

A list of URLs to crawl parsed from a sitemap.

The loading of the sitemap is performed in the background so that crawling can start before the sitemap is fully loaded.

Implements

Index

Methods

[asyncIterator]

  • [asyncIterator](): AsyncGenerator<Request<Dictionary>, void, unknown>
  • Gets the next Request to process. First, the function gets a request previously reclaimed using the RequestList.reclaimRequest function, if there is any. Otherwise it gets the next request from sources.

    The function resolves to null if there are no more requests to process.

    Can be used to iterate over the RequestList instance in a for await .. of loop. Provides an alternative for the repeated use of fetchNextRequest.


    Returns AsyncGenerator<Request<Dictionary>, void, unknown>

fetchNextRequest

  • fetchNextRequest(): Promise<null | Request<Dictionary>>
  • Gets the next Request to process. First, the function gets a request previously reclaimed using the RequestList.reclaimRequest function, if there is any. Otherwise it gets the next request from sources.

    The function's Promise resolves to null if there are no more requests to process.


    Returns Promise<null | Request<Dictionary>>

handledCount

  • handledCount(): number
  • Returns number of handled requests.


    Returns number

isEmpty

  • isEmpty(): Promise<boolean>
  • Resolves to true if the next call to IRequestList.fetchNextRequest function would return null, otherwise it resolves to false. Note that even if the list is empty, there might be some pending requests currently being processed.


    Returns Promise<boolean>

isFinished

  • isFinished(): Promise<boolean>
  • Returns true if all requests were already handled and there are no more left.


    Returns Promise<boolean>

isSitemapFullyLoaded

  • isSitemapFullyLoaded(): boolean
  • Indicates whether the background processing of sitemap contents has successfully finished.

    If this is false, the background processing is either still in progress or was aborted.


    Returns boolean

length

  • length(): number
  • Returns the total number of unique requests present in the list.


    Returns number

markRequestHandled

  • markRequestHandled(request: Request<Dictionary>): Promise<void>
  • Marks request as handled after successful processing.


    Parameters

    Returns Promise<void>

persistState

  • persistState(): Promise<void>
  • Persists the current state of the IRequestList into the default KeyValueStore. The state is persisted automatically in regular intervals, but calling this method manually is useful in cases where you want to have the most current state available after you pause or stop fetching its requests. For example after you pause or abort a crawl. Or just before a server migration.


    Returns Promise<void>

reclaimRequest

  • reclaimRequest(request: Request<Dictionary>): Promise<void>
  • Reclaims request to the list if its processing failed. The request will become available in the next this.fetchNextRequest().


    Parameters

    Returns Promise<void>

teardown

  • teardown(): Promise<void>
  • Aborts the internal sitemap loading, stops the processing of the sitemap contents and drops all the pending URLs.

    Calling fetchNextRequest() after this method will always return null.


    Returns Promise<void>

staticopen

  • Open a sitemap and start processing it.

    Resolves to a new instance of SitemapRequestList, which might not be fully loaded yet - i.e. the sitemap might still be loading in the background.

    Track the loading progress using the isSitemapFullyLoaded property.


    Parameters

    Returns Promise<SitemapRequestList>