Skip to main content
Version: Next

@crawlee/utils

Index

References

tryAbsoluteURL

Re-exports tryAbsoluteURL

Type Aliases

CheerioRoot

CheerioRoot: ReturnType<typeof load>

SearchParams

SearchParams: string | URLSearchParams | Record<string, string | number | boolean | null | undefined>

SitemapUrl

SitemapUrl: SitemapUrlData & { originSitemapUrl: string }

Variables

constCLOUDFLARE_RETRY_CSS_SELECTORS

CLOUDFLARE_RETRY_CSS_SELECTORS: string[] = ...

constRETRY_CSS_SELECTORS

RETRY_CSS_SELECTORS: string[] = ...

CSS selectors for elements that should trigger a retry, as the crawler is likely getting blocked.

constROTATE_PROXY_ERRORS

ROTATE_PROXY_ERRORS: string[] = ...

Content of proxy errors that should trigger a retry, as the proxy is likely getting blocked / is malfunctioning.

constURL_NO_COMMAS_REGEX

URL_NO_COMMAS_REGEX: RegExp = ...

Default regular expression to match URLs in a string that may be plain text, JSON, CSV or other. It supports common URL characters and does not support URLs containing commas or spaces. The URLs also may contain Unicode letters (not symbols).

constURL_WITH_COMMAS_REGEX

URL_WITH_COMMAS_REGEX: RegExp = ...

Regular expression that, in addition to the default regular expression URL_NO_COMMAS_REGEX, supports matching commas in URL path and query. Note, however, that this may prevent parsing URLs from comma delimited lists, or the URLs may become malformed.