Skip to main content
Version: Next

Sitemap

Loads one or more sitemaps from given URLs, following references in sitemap index files, and exposes the contained URLs.

Example usage:

// Load a sitemap
const sitemap = await Sitemap.load(['https://example.com/sitemap.xml', 'https://example.com/sitemap_2.xml.gz']);

// Enqueue all the contained URLs (including those from sub-sitemaps from sitemap indexes)
await crawler.addRequests(sitemap.urls);

Index

Constructors

constructor

  • new Sitemap(urls: string[]): Sitemap
  • Parameters

    • urls: string[]

    Returns Sitemap

Properties

readonlyurls

urls: string[]

Methods

staticfromXmlString

  • fromXmlString(content: string, proxyUrl?: string): Promise<Sitemap>
  • Parse XML sitemap content from a string and return URLs of referenced pages. If the sitemap references other sitemaps, they will be loaded via HTTP.


    Parameters

    • content: string

      XML sitemap content

    • optionalproxyUrl: string

      URL of a proxy to be used for fetching sitemap contents

    Returns Promise<Sitemap>

staticload

  • Fetch sitemap content from given URL or URLs and return URLs of referenced pages.


    Parameters

    • urls: string | string[]

      sitemap URL(s)

    • optionalproxyUrl: string

      URL of a proxy to be used for fetching sitemap contents

    • optionalparseSitemapOptions: ParseSitemapOptions

    Returns Promise<Sitemap>

statictryCommonNames

  • tryCommonNames(url: string, proxyUrl?: string): Promise<Sitemap>
  • Try to load sitemap from the most common locations - /sitemap.xml and /sitemap.txt. For loading based on Sitemap entries in robots.txt, the RobotsFile class should be used.


    Parameters

    • url: string

      The domain URL to fetch the sitemap for.

    • optionalproxyUrl: string

      A proxy to be used for fetching the sitemap file.

    Returns Promise<Sitemap>