Version: Next

abstractRequestProvider

Represents a provider of requests/URLs to crawl.

Hierarchy

RequestProvider
- RequestQueueV1
- RequestQueue

Implements

Constructors

constructor

new RequestProvider(options, config): RequestProvider

Parameters
- options: InternalRequestProviderOptions
- config: Configuration = ...
Returns RequestProvider

Properties

assumedHandledCount

assumedHandledCount: number = 0

assumedTotalCount

assumedTotalCount: number = 0

client

client: RequestQueueClient

clientKey

clientKey: string = ...

readonlyconfig

config: Configuration = ...

id

id: string

internalTimeoutMillis

internalTimeoutMillis: number = ...

log

log: Log

optionalname

name?: string

requestLockSecs

requestLockSecs: number = ...

timeoutSecs

timeoutSecs: number = 30

Methods

[asyncIterator]

[asyncIterator](): AsyncGenerator<Request<Dictionary>, void, unknown>

Implementation of IRequestManager.[asyncIterator]
Can be used to iterate over the RequestManager instance in a for await .. of loop. Provides an alternative for the repeated use of fetchNextRequest.
Returns AsyncGenerator<Request<Dictionary>, void, unknown>

addRequest

addRequest(requestLike, options): Promise<RequestQueueOperationInfo>

Implementation of IRequestManager.addRequest
Adds a request to the queue.

If a request with the same uniqueKey property is already present in the queue, it will not be updated. You can find out whether this happened from the resulting QueueOperationInfo object.

To add multiple requests to the queue by extracting links from a webpage, see the enqueueLinks helper function.
Parameters
- requestLike: Source
  Request object or vanilla object with request data. Note that the function sets the uniqueKey and id fields to the passed Request.
- optionaloptions: RequestQueueOperationOptions = {}
  Request queue operation options.
Returns Promise<RequestQueueOperationInfo>

addRequests

addRequests(requestsLike, options): Promise<BatchAddRequestsResult>

Adds requests to the queue in batches of 25. This method will wait till all the requests are added to the queue before resolving. You should prefer using queue.addRequestsBatched() or crawler.addRequests() if you don't want to block the processing, as those methods will only wait for the initial 1000 requests, start processing right after that happens, and continue adding more in the background.

If a request passed in is already present due to its uniqueKey property being the same, it will not be updated. You can find out whether this happened by finding the request in the resulting BatchAddRequestsResult object.
Parameters
- requestsLike: RequestsLike
  Request objects or vanilla objects with request data. Note that the function sets the uniqueKey and id fields to the passed requests if missing.
- optionaloptions: RequestQueueOperationOptions = {}
  Request queue operation options.
Returns Promise<BatchAddRequestsResult>

addRequestsBatched

addRequestsBatched(requests, options): Promise<AddRequestsBatchedResult>

Implementation of IRequestManager.addRequestsBatched
Adds requests to the queue in batches. By default, it will resolve after the initial batch is added, and continue adding the rest in the background. You can configure the batch size via batchSize option and the sleep time in between the batches via waitBetweenBatchesMillis. If you want to wait for all batches to be added to the queue, you can use the waitForAllRequestsToBeAdded promise you get in the response object.
Parameters
- requests: RequestsLike
  The requests to add
- options: AddRequestsBatchedOptions = {}
  Options for the request queue
Returns Promise<AddRequestsBatchedResult>

drop

drop(): Promise<void>

Removes the queue either from the Apify Cloud storage or from the local database, depending on the mode of operation.
Returns Promise<void>

abstractfetchNextRequest

fetchNextRequest<T>(): Promise<null | Request<T>>

Implementation of IRequestManager.fetchNextRequest
Returns a next request in the queue to be processed, or null if there are no more pending requests.

Once you successfully finish processing of the request, you need to call RequestQueue.markRequestHandled to mark the request as handled in the queue. If there was some error in processing the request, call RequestQueue.reclaimRequest instead, so that the queue will give the request to some other consumer in another call to the fetchNextRequest function.

Note that the null return value doesn't mean the queue processing finished, it means there are currently no pending requests. To check whether all requests in queue were finished, use RequestQueue.isFinished instead.
Returns Promise<null | Request<T>>
Returns the request object or null if there are no more pending requests.

getInfo

getInfo(): Promise<undefined | RequestQueueInfo>

Returns an object containing general information about the request queue.

The function returns the same object as the Apify API Client's getQueue function, which in turn calls the Get request queue API endpoint.

Example:

{
  id: "WkzbQMuFYuamGv3YF",
  name: "my-queue",
  userId: "wRsJZtadYvn4mBZmm",
  createdAt: new Date("2015-12-12T07:34:14.202Z"),
  modifiedAt: new Date("2015-12-13T08:36:13.202Z"),
  accessedAt: new Date("2015-12-14T08:36:13.202Z"),
  totalRequestCount: 25,
  handledRequestCount: 5,
  pendingRequestCount: 20,
}

Returns Promise<undefined | RequestQueueInfo>

getPendingCount

getPendingCount(): number

Implementation of IRequestManager.getPendingCount
Returns an offline approximation of the total number of pending requests in the queue.

Survives restarts and Actor migrations.
Returns number

getRequest

getRequest<T>(id): Promise<null | Request<T>>

Gets the request from the queue specified by ID.
Parameters
- id: string
  ID of the request.
Returns Promise<null | Request<T>>
Returns the request object, or null if it was not found.

getTotalCount

getTotalCount(): number

Implementation of IRequestManager.getTotalCount
Returns an offline approximation of the total number of requests in the queue (i.e. pending + handled).

Survives restarts and actor migrations.
Returns number

handledCount

handledCount(): Promise<number>

Implementation of IRequestManager.handledCount
Returns number of handled requests.
Returns Promise<number>

isEmpty

isEmpty(): Promise<boolean>

Implementation of IRequestManager.isEmpty
Resolves to true if the next call to RequestQueue.fetchNextRequest would return null, otherwise it resolves to false. Note that even if the queue is empty, there might be some pending requests currently being processed. If you need to ensure that there is no activity in the queue, use RequestQueue.isFinished.
Returns Promise<boolean>

abstractisFinished

isFinished(): Promise<boolean>

Implementation of IRequestManager.isFinished
Resolves to true if all requests were already handled and there are no more left. Due to the nature of distributed storage used by the queue, the function may occasionally return a false negative, but it shall never return a false positive.
Returns Promise<boolean>

markRequestHandled

markRequestHandled(request): Promise<null | RequestQueueOperationInfo>

Implementation of IRequestManager.markRequestHandled
Marks a request that was previously returned by the RequestQueue.fetchNextRequest function as handled after successful processing. Handled requests will never again be returned by the fetchNextRequest function.
Parameters
- request: Request<Dictionary>
Returns Promise<null | RequestQueueOperationInfo>

reclaimRequest

reclaimRequest(request, options): Promise<null | RequestQueueOperationInfo>

Implementation of IRequestManager.reclaimRequest
Reclaims a failed request back to the queue, so that it can be returned for processing later again by another call to RequestQueue.fetchNextRequest. The request record in the queue is updated using the provided request parameter. For example, this lets you store the number of retries or error messages for the request.
Parameters
- request: Request<Dictionary>
- options: RequestQueueOperationOptions = {}
Returns Promise<null | RequestQueueOperationInfo>

staticopen

open(queueIdOrName, options): Promise<RequestProvider>

Opens a request queue and returns a promise resolving to an instance of the RequestQueue class.

RequestQueue represents a queue of URLs to crawl, which is stored either on local filesystem or in the cloud. The queue is used for deep crawling of websites, where you start with several URLs and then recursively follow links to other pages. The data structure supports both breadth-first and depth-first crawling orders.

For more details and code examples, see the RequestQueue class.
Parameters
- optionalqueueIdOrName: null | string
  ID or name of the request queue to be opened. If null or undefined, the function returns the default request queue associated with the crawler run.
- optionaloptions: StorageManagerOptions = {}
  Open Request Queue options.
Returns Promise<RequestProvider>

Hierarchy

Implements

Index

Constructors

Properties

Methods

Constructors

constructor

Parameters

options: InternalRequestProviderOptions

config: Configuration = ...

Returns RequestProvider

Properties

assumedHandledCount

assumedTotalCount

client

clientKey

readonlyconfig

id

internalTimeoutMillis

log

optionalname

requestLockSecs

timeoutSecs

Methods

[asyncIterator]

Returns AsyncGenerator<Request<Dictionary>, void, unknown>

addRequest

Parameters

requestLike: Source

optionaloptions: RequestQueueOperationOptions = {}

Returns Promise<RequestQueueOperationInfo>

addRequests

Parameters

requestsLike: RequestsLike

optionaloptions: RequestQueueOperationOptions = {}

Returns Promise<BatchAddRequestsResult>

addRequestsBatched

Parameters

requests: RequestsLike

options: AddRequestsBatchedOptions = {}

Returns Promise<AddRequestsBatchedResult>

drop

Returns Promise<void>

abstractfetchNextRequest

Returns Promise<null | Request<T>>

getInfo

Returns Promise<undefined | RequestQueueInfo>

getPendingCount

Returns number

getRequest

Parameters

id: string

Returns Promise<null | Request<T>>

getTotalCount

Returns number

handledCount

Returns Promise<number>

isEmpty

Returns Promise<boolean>

abstractisFinished

Returns Promise<boolean>

markRequestHandled

Parameters

request: Request<Dictionary>

Returns Promise<null | RequestQueueOperationInfo>

reclaimRequest

Parameters

request: Request<Dictionary>

options: RequestQueueOperationOptions = {}

Returns Promise<null | RequestQueueOperationInfo>

staticopen

Parameters

optionalqueueIdOrName: null | string

optionaloptions: StorageManagerOptions = {}

Returns Promise<RequestProvider>