PydanticAiDirectExtractor
Hierarchy
- BasePydanticAiHtmlExtractor
- PydanticAiDirectExtractor
Index
Methods
Properties
Methods
__init__
Initialize a new instance.
Parameters
model: str | Model
A provider-prefixed name (e.g.
'openai:gpt-5.4-nano') or a pydantic-aiModel.optionalkeyword-onlydistiller: PydanticAiHtmlDistiller | None = None
The HTML distiller shaping the LLM input. Defaults to
PydanticAiCleanHtmlDistiller.optionalkeyword-onlyinstructions: str = _DIRECT_INSTRUCTIONS
Base task instructions. The distiller's prompt notes are appended automatically.
optionalkeyword-onlyretries: int = 1
How many times the model may fix output that fails schema validation within one run (pydantic-ai output retries).
optionalkeyword-onlyusage_limits: UsageLimits | None = None
Optional pydantic-ai
UsageLimitsapplied to every single run.
Returns None
extract
Distill
content, send it to the model, and return a validatedschema.Parameters
content: str | Selector
Raw HTML or a parsed Parsel
Selector.schema: type[TSchema]
The Pydantic model describing the desired output.
optionalkeyword-onlyscope: str | None = None
Optional CSS selector restricting extraction to the first matching subtree.
optionalkeyword-onlycache_tag: str | None = None
Ignored in direct extraction.
optionalkeyword-onlyadditional_instructions: str | None = None
Extra instructions appended for this call only.
Returns TSchema
set_ai_usage
Replace the usage accumulator with
value.Lets an external owner share one accumulator across a delegation chain.
Parameters
value: PydanticAiUsageStats
The accumulator to adopt.
Returns None
Properties
ai_usage
Accumulated token usage of this extractor's runs.
Extractor that asks the LLM to read the page and return the data directly.
The page is distilled to compact HTML and sent to the model in a single call. The user schema is the agent's output type, so pydantic-ai validates the result and feeds invalid output back to the model. This is the simplest extractor and works on any page, at the cost of one LLM call per page.
See the
PydanticAiHtmlExtractorprotocol for the common extractor interface, andPydanticAiSelectorExtractorfor a variant that learns reusable CSS selectors.Usage