Version: Next

BasePydanticAiHtmlExtractor

Base class for the built-in HTML extractors.

An HTML extractor turns a page into a validated Pydantic model with the help of an LLM. This abstract base implements the parts the built-in extractors share: resolving the model, composing the task instructions with the distiller's prompt notes, and accumulating token usage.

The public interface is the PydanticAiHtmlExtractor protocol. The concrete extractors are PydanticAiDirectExtractor and PydanticAiSelectorExtractor.

Hierarchy

BasePydanticAiHtmlExtractor
- PydanticAiDirectExtractor
- PydanticAiSelectorExtractor

Index

Methods

Properties

ai_usage

Methods

init

__init__(model, *, distiller, instructions, usage_limits): None

Initialize a new instance.
Parameters
- model: str | Model
  A provider-prefixed name (e.g. 'openai:gpt-5.4-nano') or a pydantic-ai Model. Credentials are read from the provider's environment variable (e.g. OPENAI_API_KEY) or passed explicitly through a Model instance.
- keyword-onlydistiller: PydanticAiHtmlDistiller
  The HTML distiller shaping the LLM input.
- keyword-onlyinstructions: str
  Base task instructions. The distiller's prompt notes are appended automatically.
- keyword-onlyusage_limits: UsageLimits | None
  Optional pydantic-ai UsageLimits applied to every single run.
Returns None

extract

async extract(content, schema, *, scope, cache_tag, additional_instructions): TSchema

Extract a structured instance of schema from content.
Parameters
- content: str | Selector
- schema: type[TSchema]
- optionalkeyword-onlyscope: str | None = None
- optionalkeyword-onlycache_tag: str | None = None
- optionalkeyword-onlyadditional_instructions: str | None = None
Returns TSchema

set_ai_usage

set_ai_usage(value): None

Replace the usage accumulator with value.

Lets an external owner share one accumulator across a delegation chain.
Parameters
- value: PydanticAiUsageStats
  The accumulator to adopt.
Returns None

Properties

ai_usage

ai_usage: PydanticAiUsageStats

Accumulated token usage of this extractor's runs.

Hierarchy

Index

Methods

Properties

Methods

__init__

Parameters

model: str | Model

keyword-onlydistiller: PydanticAiHtmlDistiller

keyword-onlyinstructions: str

keyword-onlyusage_limits: UsageLimits | None

Returns None

extract

Parameters

content: str | Selector

schema: type[TSchema]

optionalkeyword-onlyscope: str | None = None

optionalkeyword-onlycache_tag: str | None = None

optionalkeyword-onlyadditional_instructions: str | None = None

Returns TSchema

set_ai_usage

Parameters

value: PydanticAiUsageStats

Returns None

Properties

ai_usage

init