BasePydanticAiHtmlExtractor
Hierarchy
- BasePydanticAiHtmlExtractor
Index
Methods
Properties
Methods
__init__
Initialize a new instance.
Parameters
model: str | Model
A provider-prefixed name (e.g.
'openai:gpt-5.4-nano') or a pydantic-aiModel. Credentials are read from the provider's environment variable (e.g.OPENAI_API_KEY) or passed explicitly through aModelinstance.keyword-onlydistiller: PydanticAiHtmlDistiller
The HTML distiller shaping the LLM input.
keyword-onlyinstructions: str
Base task instructions. The distiller's prompt notes are appended automatically.
keyword-onlyusage_limits: UsageLimits | None
Optional pydantic-ai
UsageLimitsapplied to every single run.
Returns None
extract
set_ai_usage
Replace the usage accumulator with
value.Lets an external owner share one accumulator across a delegation chain.
Parameters
value: PydanticAiUsageStats
The accumulator to adopt.
Returns None
Properties
ai_usage
Accumulated token usage of this extractor's runs.
Base class for the built-in HTML extractors.
An HTML extractor turns a page into a validated Pydantic model with the help of an LLM. This abstract base implements the parts the built-in extractors share: resolving the model, composing the task instructions with the distiller's prompt notes, and accumulating token usage.
The public interface is the
PydanticAiHtmlExtractorprotocol. The concrete extractors arePydanticAiDirectExtractorandPydanticAiSelectorExtractor.