SitemapRequestList
Implements
Index
Methods
[asyncIterator]
Gets the next Request to process. First, the function gets a request previously reclaimed using the RequestList.reclaimRequest function, if there is any. Otherwise it gets the next request from sources.
The function resolves to
null
if there are no more requests to process.Can be used to iterate over the
RequestList
instance in afor await .. of
loop. Provides an alternative for the repeated use offetchNextRequest
.Returns AsyncGenerator<Request<Dictionary>, void, unknown>
fetchNextRequest
Gets the next Request to process. First, the function gets a request previously reclaimed using the RequestList.reclaimRequest function, if there is any. Otherwise it gets the next request from sources.
The function's
Promise
resolves tonull
if there are no more requests to process.Returns Promise<null | Request<Dictionary>>
handledCount
Returns number of handled requests.
Returns number
isEmpty
Resolves to
true
if the next call to IRequestList.fetchNextRequest function would returnnull
, otherwise it resolves tofalse
. Note that even if the list is empty, there might be some pending requests currently being processed.Returns Promise<boolean>
isFinished
Returns
true
if all requests were already handled and there are no more left.Returns Promise<boolean>
isSitemapFullyLoaded
Indicates whether the background processing of sitemap contents has successfully finished.
If this is
false
, the background processing is either still in progress or was aborted.Returns boolean
length
Returns the total number of unique requests present in the list.
Returns number
markRequestHandled
Marks request as handled after successful processing.
Parameters
request: Request<Dictionary>
Returns Promise<void>
persistState
Persists the current state of the
IRequestList
into the default KeyValueStore. The state is persisted automatically in regular intervals, but calling this method manually is useful in cases where you want to have the most current state available after you pause or stop fetching its requests. For example after you pause or abort a crawl. Or just before a server migration.Returns Promise<void>
reclaimRequest
Reclaims request to the list if its processing failed. The request will become available in the next
this.fetchNextRequest()
.Parameters
request: Request<Dictionary>
Returns Promise<void>
teardown
Aborts the internal sitemap loading, stops the processing of the sitemap contents and drops all the pending URLs.
Calling
fetchNextRequest()
after this method will always returnnull
.Returns Promise<void>
staticopen
Open a sitemap and start processing it.
Resolves to a new instance of
SitemapRequestList
, which might not be fully loaded yet - i.e. the sitemap might still be loading in the background.Track the loading progress using the
isSitemapFullyLoaded
property.Parameters
options: SitemapRequestListOptions
Returns Promise<SitemapRequestList>
A list of URLs to crawl parsed from a sitemap.
The loading of the sitemap is performed in the background so that crawling can start before the sitemap is fully loaded.