Skip to main content

AdaptivePlaywrightCrawlingContext

Hierarchy

Index

Methods

__hash__

  • __hash__(): int

from_basic_crawling_context

  • from_basic_crawling_context(context, http_response): Self

from_http_crawling_context

  • from_http_crawling_context(context, parsed_content, enqueue_links): Self

from_parsed_http_crawling_context

from_playwright_crawling_context

parse_with_static_parser

  • Parse whole page with static parser. If selector argument is used, wait for selector first.

    If element is not found within timeout, TimeoutError is raised.


    Parameters

    • optionalselector: str | None = None

      css selector to be used to locate specific element on page.

    • optionaltimeout: timedelta = timedelta(seconds=5)

      timeout that defines how long the function wait for the selector to appear.

    Returns TStaticParseResult

query_selector_all

  • Locate element by css selector and return all elements found.

    If element is not found within timeout, TimeoutError is raised.


    Parameters

    • selector: str

      Css selector to be used to locate specific element on page.

    • optionaltimeout: timedelta = timedelta(seconds=5)

      Timeout that defines how long the function wait for the selector to appear.

    Returns Sequence[TStaticSelectResult]

query_selector_one

  • Locate element by css selector and return first element found.

    If element is not found within timeout, TimeoutError is raised.


    Parameters

    • selector: str

      Css selector to be used to locate specific element on page.

    • optionaltimeout: timedelta = timedelta(seconds=5)

      Timeout that defines how long the function wait for the selector to appear.

    Returns TStaticSelectResult | None

wait_for_selector

  • async wait_for_selector(selector, timeout): None
  • Locate element by css selector and return None once it is found.

    If element is not found within timeout, TimeoutError is raised.


    Parameters

    • selector: str

      Css selector to be used to locate specific element on page.

    • optionaltimeout: timedelta = timedelta(seconds=5)

      Timeout that defines how long the function wait for the selector to appear.

    Returns None

Properties

add_requests

add_requests: AddRequestsFunction

Add requests crawling context helper function.

enqueue_links

enqueue_links: EnqueueLinksFunction

get_key_value_store

Get key-value store crawling context helper function.

http_response

http_response: HttpResponse

The HTTP response received from the server.

infinite_scroll

infinite_scroll: Callable[[], Awaitable[None]]

A function to perform infinite scrolling on the page.

This scrolls to the bottom, triggering the loading of additional content if present. Raises AdaptiveContextError if accessed during static crawling.

log

log: logging.Logger

Logger instance.

page

page: Page

The Playwright Page object for the current page.

Raises AdaptiveContextError if accessed during static crawling.

parsed_content

parsed_content: TParseResult

proxy_info

proxy_info: ProxyInfo | None

Proxy information for the current page being processed.

push_data

push_data: PushDataFunction

Push data crawling context helper function.

request

request: Request

Request object for the current page being processed.

response

response: Response

The Playwright Response object containing the response details for the current URL.

Raises AdaptiveContextError if accessed during static crawling.

send_request

send_request: SendRequestFunction

Send request crawling context helper function.

session

session: Session | None

Session object for the current page being processed.

use_state

use_state: UseStateFunction

Use state crawling context helper function.