CrawlerRunOptions
Hierarchy
- CrawlerAddRequestsOptions
- CrawlerRunOptions
Index
Properties
optionalinheritedbatchSize
optionalinheritedforefront
If set to true:
- while adding the request to the queue: the request will be added to the foremost position in the queue.
- while reclaiming the request: the request will be placed to the beginning of the queue, so that it's returned in the next call to RequestQueue.fetchNextRequest. By default, it's put to the end of the queue.
In case the request is already present in the queue, this option has no effect.
If more requests are added with this option at once, their order in the following fetchNextRequest call
is arbitrary.
optionalinheritedmaxNewRequests
If set, only this many actually new requests (i.e. not already present in the queue) will be added.
Once the budget is reached, remaining requests from the iterable will be collected in
requestsOverLimit instead.
This is useful in combination with maxRequestsPerCrawl to avoid duplicate URLs consuming the budget.
Note: Setting this option implicitly enables waitForAllRequestsToBeAdded,
since all batches must complete before leftover requests can be accurately reported.
optionalpurgeRequestQueue
Whether to purge the RequestQueue before running the crawler again. Defaults to true, so it is possible to reprocess failed requests. When disabled, only new requests will be considered. Note that even a failed request is considered as handled.
optionalinheritedwaitBetweenBatchesMillis
optionalinheritedwaitForAllRequestsToBeAdded
Whether to wait for all the provided requests to be added, instead of waiting just for the initial batch of up to batchSize.