Skip to main content
Version: Next

Request <UserData>

Represents a URL to be crawled, optionally including HTTP method, headers, payload and other metadata. The Request object also stores information about errors that occurred during processing of the request.

Each Request instance has the uniqueKey property, which can be either specified manually in the constructor or generated automatically from the URL. Two requests with the same uniqueKey are considered as pointing to the same web resource. This behavior applies to all Crawlee classes, such as RequestList, RequestQueue, PuppeteerCrawler or PlaywrightCrawler.

To access and examine the actual request sent over http, with all autofilled headers you can access response.request object from the request handler

Example use:

const request = new Request({
url: 'http://www.example.com',
headers: { Accept: 'application/json' },
});

...

request.userData.foo = 'bar';
request.pushErrorMessage(new Error('Request failed!'));

...

const foo = request.userData.foo;

Index

Constructors

constructor

  • Request parameters including the URL, HTTP method and headers, and others.


    Type parameters

    • UserData: Dictionary = Dictionary

    Parameters

    Returns Request<UserData>

Properties

errorMessages

errorMessages: string[]

An array of error messages from request processing.

optionalhandledAt

handledAt?: string

ISO datetime string that indicates the time when the request has been processed. Is null if the request has not been crawled yet.

optionalheaders

headers?: Record<string, string>

Object with HTTP headers. Key is header name, value is the value.

optionalid

id?: string

Request ID

optionalloadedUrl

loadedUrl?: string

An actually loaded URL after redirects, if present. HTTP redirects are guaranteed to be included.

When using PuppeteerCrawler or PlaywrightCrawler, meta tag and JavaScript redirects may, or may not be included, depending on their nature. This generally means that redirects, which happen immediately will most likely be included, but delayed redirects will not.

method

method: AllowedHttpMethods

HTTP method, e.g. GET or POST.

noRetry

noRetry: boolean

The true value indicates that the request will not be automatically retried on error.

optionalpayload

payload?: string

HTTP request payload, e.g. for POST requests.

retryCount

retryCount: number

Indicates the number of times the crawling of the request has been retried on error.

uniqueKey

uniqueKey: string

A unique key identifying the request. Two requests with the same uniqueKey are considered as pointing to the same URL.

url

url: string

URL of the web page to crawl.

userData

userData: UserData = ...

Custom user data assigned to the request.

Accessors

label

  • get label(): undefined | string
  • set label(value: undefined | string): void
  • shortcut for getting request.userData.label


    Returns undefined | string

  • shortcut for setting request.userData.label


    Parameters

    • value: undefined | string

    Returns void

maxRetries

  • get maxRetries(): undefined | number
  • set maxRetries(value: undefined | number): void
  • Maximum number of retries for this request. Allows to override the global maxRequestRetries option of BasicCrawler.


    Returns undefined | number

  • Maximum number of retries for this request. Allows to override the global maxRequestRetries option of BasicCrawler.


    Parameters

    • value: undefined | number

    Returns void

sessionRotationCount

  • get sessionRotationCount(): number
  • set sessionRotationCount(value: number): void
  • Indicates the number of times the crawling of the request has rotated the session due to a session or a proxy error.


    Returns number

  • Indicates the number of times the crawling of the request has rotated the session due to a session or a proxy error.


    Parameters

    • value: number

    Returns void

skipNavigation

  • get skipNavigation(): boolean
  • set skipNavigation(value: boolean): void
  • Tells the crawler processing this request to skip the navigation and process the request directly.


    Returns boolean

  • Tells the crawler processing this request to skip the navigation and process the request directly.


    Parameters

    • value: boolean

    Returns void

state

  • Describes the request's current lifecycle state.


    Returns RequestState

  • Describes the request's current lifecycle state.


    Parameters

    Returns void

Methods

pushErrorMessage

  • Stores information about an error that occurred during processing of this request.

    You should always use Error instances when throwing errors in JavaScript.

    Nevertheless, to improve the debugging experience when using third party libraries that may not always throw an Error instance, the function performs a type inspection of the passed argument and attempts to extract as much information as possible, since just throwing a bad type error makes any debugging rather difficult.


    Parameters

    • errorOrMessage: unknown

      Error object or error message to be stored in the request.

    • optionaloptions: PushErrorMessageOptions = {}

    Returns void