Skip to main content

BeautifulSoupCrawlingContext

The crawling context used by the BeautifulSoupCrawler.

It provides access to key objects as well as utility functions for handling crawling tasks.

Hierarchy

Index

Methods

from_basic_crawling_context

  • from_basic_crawling_context(*, context, http_response): Self
  • Convenience constructor that creates HttpCrawlingContext from existing BasicCrawlingContext.


    Parameters

    Returns Self

from_http_crawling_context

  • from_http_crawling_context(*, context, parsed_content, enqueue_links): Self
  • Convenience constructor that creates new context from existing HttpCrawlingContext.


    Parameters

    Returns Self

from_parsed_http_crawling_context

  • from_parsed_http_crawling_context(*, context): Self
  • Convenience constructor that creates new context from existing ParsedHttpCrawlingContext[BeautifulSoup].


    Parameters

    Returns Self

html_to_text

  • html_to_text(): str
  • Convert the parsed HTML content to newline-separated plain text without tags.


    Returns str

Properties

add_requests

add_requests: AddRequestsFunction

enqueue_links

enqueue_links: EnqueueLinksFunction

get_key_value_store

http_response

http_response: HttpResponse

The HTTP response received from the server.

log

log: logging.Logger

parsed_content

parsed_content: TParseResult

proxy_info

proxy_info: ProxyInfo | None

push_data

push_data: PushDataFunction

request

request: Request

send_request

send_request: SendRequestFunction

session

session: Session | None

soup

soup: BeautifulSoup

Convenience alias.

use_state

use_state: UseStateFunction