BeautifulSoupCrawlingContext
Hierarchy
- ParsedHttpCrawlingContext
- BeautifulSoupCrawlingContext
Index
Methods
from_basic_crawling_context
Convenience constructor that creates
HttpCrawlingContext
from existingBasicCrawlingContext
.Parameters
optionalkeyword-onlycontext: BasicCrawlingContext
optionalkeyword-onlyhttp_response: HttpResponse
Returns Self
from_http_crawling_context
Convenience constructor that creates new context from existing HttpCrawlingContext.
Parameters
optionalkeyword-onlycontext: HttpCrawlingContext
optionalkeyword-onlyparsed_content: TParseResult
optionalkeyword-onlyenqueue_links: EnqueueLinksFunction
Returns Self
from_parsed_http_crawling_context
Convenience constructor that creates new context from existing
ParsedHttpCrawlingContext[BeautifulSoup]
.Parameters
optionalkeyword-onlycontext: ParsedHttpCrawlingContext[BeautifulSoup]
Returns Self
html_to_text
Convert the parsed HTML content to newline-separated plain text without tags.
Returns str
Properties
add_requests
Add requests crawling context helper function.
enqueue_links
get_key_value_store
Get key-value store crawling context helper function.
http_response
The HTTP response received from the server.
log
Logger instance.
parsed_content
proxy_info
Proxy information for the current page being processed.
push_data
Push data crawling context helper function.
request
Request object for the current page being processed.
send_request
Send request crawling context helper function.
session
Session object for the current page being processed.
soup
Convenience alias.
use_state
Use state crawling context helper function.
The crawling context used by the
BeautifulSoupCrawler
.It provides access to key objects as well as utility functions for handling crawling tasks.