Skip to main content

BeautifulSoupCrawler

crawlee.beautifulsoup_crawler._beautifulsoup_crawler.BeautifulSoupCrawler

A crawler that fetches the request URL using httpx and parses the result with BeautifulSoup.

Index

Constructors

Constructors

__init__

  • __init__(*, parser, additional_http_error_status_codes, ignore_http_error_status_codes, kwargs): None
  • Initialize the BeautifulSoupCrawler.


    Parameters

    • parser: Literal['html.parser', 'lxml', 'xml', 'html5lib'] = 'lxml'keyword-only
    • additional_http_error_status_codes: Iterable[int] = ()keyword-only
    • ignore_http_error_status_codes: Iterable[int] = ()keyword-only
    • kwargs: Unpack[BasicCrawlerOptions[BeautifulSoupCrawlingContext]]

    Returns None