Skip to main content
Version: Next

RouterHandler <Context, Routes>

Simple router that works based on request labels. This instance can then serve as a requestHandler of your crawler.

import { Router, CheerioCrawler, CheerioCrawlingContext } from 'crawlee';

const router = Router.create<CheerioCrawlingContext>();

// we can also use factory methods for specific crawling contexts, the above equals to:
// import { createCheerioRouter } from 'crawlee';
// const router = createCheerioRouter();

router.addHandler('label-a', async (ctx) => {
ctx.log.info('...');
});
router.addDefaultHandler(async (ctx) => {
ctx.log.info('...');
});

const crawler = new CheerioCrawler({
requestHandler: router,
});
await crawler.run();

Alternatively we can use the default router instance from crawler object:

import { CheerioCrawler } from 'crawlee';

const crawler = new CheerioCrawler();

crawler.router.addHandler('label-a', async (ctx) => {
ctx.log.info('...');
});
crawler.router.addDefaultHandler(async (ctx) => {
ctx.log.info('...');
});

await crawler.run();

For convenience, we can also define the routes right when creating the router:

import { CheerioCrawler, createCheerioRouter } from 'crawlee';
const crawler = new CheerioCrawler({
requestHandler: createCheerioRouter({
'label-a': async (ctx) => { ... },
'label-b': async (ctx) => { ... },
})},
});
await crawler.run();

Middlewares are also supported via the router.use method. There can be multiple middlewares for a single router, they will be executed sequentially in the same order as they were registered.

crawler.router.use(async (ctx) => {
ctx.log.info('...');
});

To get request.userData typed per label, declare a route map and pass it as the second type argument. The label passed to Router.addHandler then drives the type of request.userData, and unknown labels are rejected at compile time:

import { createCheerioRouter, CheerioCrawlingContext } from 'crawlee';

interface Routes {
PRODUCT: { sku: string; price: number };
CATEGORY: { categoryId: string };
}

const router = createCheerioRouter<CheerioCrawlingContext, Routes>();

router.addHandler('PRODUCT', async ({ request }) => {
request.userData.sku; // string
request.userData.price; // number
});

router.addHandler('TYPO', async () => {}); // compile error: not a known label

Hierarchy

  • Router<Context, Routes>
    • RouterHandler

Callable

  • RouterHandler(ctx): Awaitable<void>

  • Parameters

    • ctx: Context

    Returns Awaitable<void>

Index

Methods

inheritedaddDefaultHandler

  • addDefaultHandler<UserData>(handler): void
  • Registers default route handler. As a fallback it can receive any request (including labels not declared in the route map), so request.userData defaults to the context's userData type (loosely typed by default). Pass an explicit UserData type argument to narrow it.


    Parameters

    • handler: (ctx) => Awaitable<void>

      Returns void

    inheritedaddHandler

    • addHandler<Label>(label, handler): void
    • addHandler<UserData>(label, handler): void
    • Registers new route handler for given label. When the router declares a route map, the label is restricted to the declared labels and request.userData is typed accordingly.


      Parameters

      • label: Label
      • handler: (ctx) => Awaitable<void>

        Returns void

      inheritedgetHandler

      • getHandler(label): (ctx) => Awaitable<void>
      • Returns route handler for given label. If no label is provided, the default request handler will be returned.


        Parameters

        • optionallabel: string | symbol

        Returns (ctx) => Awaitable<void>

          • (ctx): Awaitable<void>
          • Parameters

            • ctx: Context

            Returns Awaitable<void>

      inheriteduse

      • use(middleware): void
      • Registers a middleware that will be fired before the matching route handler. Multiple middlewares can be registered, they will be fired in the same order.


        Parameters

        • middleware: (ctx) => Awaitable<void>

          Returns void