AdaptivePlaywrightCrawlingContext
Hierarchy
- ParsedHttpCrawlingContext
- AdaptivePlaywrightCrawlingContext
Index
Methods
Properties
Methods
__hash__
Returns int
from_basic_crawling_context
Convenience constructor that creates
HttpCrawlingContext
from existingBasicCrawlingContext
.Parameters
context: BasicCrawlingContext
http_response: HttpResponse
Returns Self
from_http_crawling_context
Convenience constructor that creates new context from existing HttpCrawlingContext.
Parameters
context: HttpCrawlingContext
parsed_content: TParseResult
enqueue_links: EnqueueLinksFunction
Returns Self
from_parsed_http_crawling_context
Convenience constructor that creates new context from existing
ParsedHttpCrawlingContext
.Parameters
context: ParsedHttpCrawlingContext[TStaticParseResult]
parser: AbstractHttpParser[TStaticParseResult, TStaticSelectResult]
Returns AdaptivePlaywrightCrawlingContext[TStaticParseResult, TStaticSelectResult]
from_playwright_crawling_context
Convenience constructor that creates new context from existing
PlaywrightCrawlingContext
.Parameters
context: PlaywrightCrawlingContext
parser: AbstractHttpParser[TStaticParseResult, TStaticSelectResult]
Returns AdaptivePlaywrightCrawlingContext[TStaticParseResult, TStaticSelectResult]
parse_with_static_parser
Parse whole page with static parser. If
selector
argument is used, wait for selector first.If element is not found within timeout, TimeoutError is raised.
Parameters
optionalselector: str | None = None
css selector to be used to locate specific element on page.
optionaltimeout: timedelta = timedelta(seconds=5)
timeout that defines how long the function wait for the selector to appear.
Returns TStaticParseResult
query_selector_all
Locate element by css selector and return all elements found.
If element is not found within timeout,
TimeoutError
is raised.Parameters
selector: str
Css selector to be used to locate specific element on page.
optionaltimeout: timedelta = timedelta(seconds=5)
Timeout that defines how long the function wait for the selector to appear.
Returns Sequence[TStaticSelectResult]
query_selector_one
Locate element by css selector and return first element found.
If element is not found within timeout,
TimeoutError
is raised.Parameters
selector: str
Css selector to be used to locate specific element on page.
optionaltimeout: timedelta = timedelta(seconds=5)
Timeout that defines how long the function wait for the selector to appear.
Returns TStaticSelectResult | None
wait_for_selector
Locate element by css selector and return
None
once it is found.If element is not found within timeout,
TimeoutError
is raised.Parameters
selector: str
Css selector to be used to locate specific element on page.
optionaltimeout: timedelta = timedelta(seconds=5)
Timeout that defines how long the function wait for the selector to appear.
Returns None
Properties
add_requests
Add requests crawling context helper function.
enqueue_links
get_key_value_store
Get key-value store crawling context helper function.
http_response
The HTTP response received from the server.
infinite_scroll
A function to perform infinite scrolling on the page.
This scrolls to the bottom, triggering the loading of additional content if present.
Raises AdaptiveContextError
if accessed during static crawling.
log
Logger instance.
page
The Playwright Page
object for the current page.
Raises AdaptiveContextError
if accessed during static crawling.
parsed_content
proxy_info
Proxy information for the current page being processed.
push_data
Push data crawling context helper function.
request
Request object for the current page being processed.
response
The Playwright Response
object containing the response details for the current URL.
Raises AdaptiveContextError
if accessed during static crawling.
send_request
Send request crawling context helper function.
session
Session object for the current page being processed.
use_state
Use state crawling context helper function.
Return hash of the context. Each context is considered unique.