Skip to main content

ParselCrawlingContext

The crawling context used by the ParselCrawler.

It provides access to key objects as well as utility functions for handling crawling tasks.

Hierarchy

Index

Methods

__hash__

  • __hash__(): int

from_basic_crawling_context

  • from_basic_crawling_context(context, http_response): Self

from_http_crawling_context

  • from_http_crawling_context(context, parsed_content, enqueue_links): Self

from_parsed_http_crawling_context

  • from_parsed_http_crawling_context(context): Self
  • Convenience constructor that creates new context from existing ParsedHttpCrawlingContext[BeautifulSoup].


    Parameters

    Returns Self

html_to_text

  • html_to_text(): str
  • Convert the parsed HTML content to newline-separated plain text without tags.


    Returns str

Properties

add_requests

add_requests: AddRequestsFunction

Add requests crawling context helper function.

enqueue_links

enqueue_links: EnqueueLinksFunction

get_key_value_store

Get key-value store crawling context helper function.

http_response

http_response: HttpResponse

The HTTP response received from the server.

log

log: logging.Logger

Logger instance.

parsed_content

parsed_content: TParseResult

proxy_info

proxy_info: ProxyInfo | None

Proxy information for the current page being processed.

push_data

push_data: PushDataFunction

Push data crawling context helper function.

request

request: Request

Request object for the current page being processed.

selector

selector: Selector

Convenience alias.

send_request

send_request: SendRequestFunction

Send request crawling context helper function.

session

session: Session | None

Session object for the current page being processed.

use_state

use_state: UseStateFunction

Use state crawling context helper function.