Skip to main content

ContextPipeline

crawlee.basic_crawler.context_pipeline.ContextPipeline

Encapsulates the logic of gradually enhancing the crawling context with additional information and utilities.

The enhancement is done by a chain of middlewares that are added to the pipeline after it's creation.

Index

Constructors

Methods

Constructors

__init__

  • __init__(*, _middleware, _parent): None
  • Parameters

    • _middleware: Callable[ [TCrawlingContext], AsyncGenerator[TMiddlewareCrawlingContext, None], ] | None = Nonekeyword-only
    • _parent: ContextPipeline[BasicCrawlingContext] | None = Nonekeyword-only

    Returns None

Methods

__call__

  • async __call__(crawling_context, final_context_consumer): None
  • Run a crawling context through the middleware chain and pipe it into a consumer function.

    Exceptions from the consumer function are wrapped together with the final crawling context.


    Parameters

    • crawling_context: BasicCrawlingContext
    • final_context_consumer: Callable[[TCrawlingContext], Awaitable[None]]

    Returns None

compose

  • compose(middleware): ContextPipeline[TMiddlewareCrawlingContext]
  • Add a middleware to the pipeline.

    The middleware should yield exactly once, and it should yield an (optionally) extended crawling context object. The part before the yield can be used for initialization and the part after it for cleanup.


    Parameters

    • middleware: Callable[ [TCrawlingContext], AsyncGenerator[TMiddlewareCrawlingContext, None], ]

    Returns ContextPipeline[TMiddlewareCrawlingContext]