Changelog
All notable changes to this project will be documented in this file.
0.5.3 - not yet releasedโ
๐ Bug Fixesโ
- Fix crawler not retrying user handler if there was timeout in the handler (#909) (f4090ef) by @Pijukatel, closes #907
- Optimize memory consumption for
HttpxHttpClient
, fix proxy handling (#905) (d7ad480) by @Mantisus, closes #895
0.5.2 (2025-01-17)โ
๐ Bug Fixesโ
- Avoid
use_state
race conditions. Remove key argument touse_state
(#868) (000b976) by @Pijukatel, closes #856 - Restore proxy functionality for PlaywrightCrawler broken in v0.5 (#889) (908c944) by @Mantisus, closes #887
- Fix the usage of Configuration (#899) (0f1cf6f) by @vdusek, closes #670
0.5.1 (2025-01-07)โ
๐ Bug Fixesโ
- Make result of RequestList.is_empty independent of fetch_next_request calls (#876) (d50249e) by @janbuchar
0.5.0 (2025-01-02)โ
๐ Featuresโ
- Add possibility to use None as no proxy in tiered proxies (#760) (0fbd017) by @Pijukatel, closes #687
- Add
use_state
context method (#682) (868b41e) by @Mantisus, closes #191 - Add pre-navigation hooks router to AbstractHttpCrawler (#791) (0f23205) by @Pijukatel, closes #635
- Add example of how to integrate Camoufox into PlaywrightCrawler (#789) (246cfc4) by @Pijukatel, closes #684
- Expose event types, improve on/emit signature, allow parameterless listeners (#800) (c102c4c) by @janbuchar, closes #561
- Add stop method to BasicCrawler (#807) (6d01af4) by @Pijukatel, closes #651
- Add
html_to_text
helper function (#792) (2b9d970) by @Pijukatel, closes #659 - [breaking] Implement
RequestManagerTandem
, removeadd_request
fromRequestList
, accept any iterable inRequestList
constructor (#777) (4172652) by @janbuchar
๐ Bug Fixesโ
- Fix circular import in
KeyValueStore
(#805) (8bdf49d) by @Mantisus, closes #804 - [breaking] Refactor service usage to rely on
service_locator
(#691) (1d31c6c) by @vdusek, closes #369, #539, #699 - Pass
verify
in httpx client (#802) (074d083) by @Mantisus, closes #798 - Fix
page_options
forPlaywrightBrowserPlugin
(#796) (bd3bdd4) by @Mantisus, closes #755 - Fix event migrating handler in
RequestQueue
(#825) (fd6663f) by @Mantisus, closes #815 - Respect user configuration for work with status codes (#812) (8daf4bd) by @Mantisus, closes #708, #756
abort-on-error
for successive runs (#834) (0cea673) by @Mantisus- Relax ServiceLocator restrictions (#837) (aa3667f) by @janbuchar, closes #806
- Fix typo in exports (#841) (8fa6ac9) by @janbuchar
Refactorโ
- [breaking] Refactor HttpCrawler, BeautifulSoupCrawler, ParselCrawler inheritance (#746) (9d3c269) by @Pijukatel, closes #350
- [breaking] Remove
json_
andorder_no
fromRequest
(#788) (5381d13) by @Mantisus, closes #94 - [breaking] Rename PwPreNavContext to PwPreNavCrawlingContext (#827) (84b61a3) by @vdusek
- [breaking] Rename PlaywrightCrawler kwargs: browser_options, page_options (#831) (ffc6048) by @Pijukatel
- [breaking] Update the crawlers & storage clients structure (#828) (0ba04d1) by @vdusek, closes #764
0.4.5 (2024-12-06)โ
๐ Featuresโ
๐ Bug Fixesโ
- Add upper bound of HTTPX version (#775) (b59e34d) by @vdusek
- Fix incorrect use of desired concurrency ratio (#780) (d1f8bfb) by @Pijukatel, closes #759
- Remove pydantic constraint <2.10.0 and update timedelta validator, serializer type hints (#757) (c0050c0) by @Pijukatel
0.4.4 (2024-11-29)โ
๐ Featuresโ
- Expose browser_options and page_options to PlaywrightCrawler (#730) (dbe85b9) by @vdusek, closes #719
- Add
abort_on_error
property (#731) (6dae03a) by @Mantisus, closes #704
๐ Bug Fixesโ
0.4.3 (2024-11-21)โ
๐ Bug Fixesโ
- Pydantic 2.10.0 issues (#716) (8d8b3fc) by @Pijukatel
0.4.2 (2024-11-20)โ
๐ Bug Fixesโ
- Respect custom HTTP headers in
PlaywrightCrawler
(#685) (a84125f) by @Mantisus - Fix serialization payload in Request. Fix Docs for Post Request (#683) (e8b4d2d) by @Mantisus, closes #668
- Accept string payload in the Request constructor (#697) (19f5add) by @vdusek
- Fix snapshots handling (#692) (4016c0d) by @Pijukatel
0.4.1 (2024-11-11)โ
๐ Featuresโ
- Add
max_crawl_depth
option toBasicCrawler
(#637) (77deaa9) by @Prathamesh010, closes #460 - Add BeautifulSoupParser type alias (#674) (b2cf88f) by @Pijukatel
๐ Bug Fixesโ
- Fix total_size usage in memory size monitoring (#661) (c2a3239) by @janbuchar
- Add HttpHeaders to module exports (#664) (f0c5ca7) by @vdusek, closes #663
- Fix unhandled ValueError in request handler result processing (#666) (0a99d7f) by @janbuchar
- Fix BaseDatasetClient.iter_items type hints (#680) (a968b1b) by @Pijukatel
0.4.0 (2024-11-01)โ
๐ Featuresโ
- [breaking] Add headers in unique key computation (#609) (6c4746f) by @Prathamesh010, closes #548
- Add
pre_navigation_hooks
toPlaywrightCrawler
(#631) (5dd5b60) by @Prathamesh010, closes #427 - Add
always_enqueue
option to bypass URL deduplication (#621) (4e59fa4) by @Rutam21, closes #547 - Split and add extra configuration to export_data method (#580) (6751635) by @deshansh, closes #526
๐ Bug Fixesโ
- Use strip in headers normalization (#614) (a15b21e) by @vdusek
- [breaking] Merge payload and data fields of Request (#542) (d06fcef) by @vdusek, closes #560
- Default ProxyInfo port if httpx.URL port is None (#619) (8107a6f) by @steffansafey, closes #618
Choreโ
0.3.9 (2024-10-23)โ
๐ Featuresโ
- Key-value store context helpers (#584) (fc15622) by @janbuchar
- Added get_public_url method to KeyValueStore (#572) (3a4ba8f) by @akshay11298, closes #514
๐ Bug Fixesโ
- Workaround for JSON value typing problems (#581) (403496a) by @janbuchar, closes #563
0.3.8 (2024-10-02)โ
๐ Featuresโ
- Mask Playwright's "headless" headers (#545) (d1445e4) by @vdusek, closes #401
- Add new model for
HttpHeaders
(#544) (854f2c1) by @vdusek
๐ Bug Fixesโ
- Call
error_handler
forSessionError
(#557) (e75ac4b) by @vdusek, closes #546 - Extend from
StrEnum
inRequestState
to fix serialization (#556) (6bf35ba) by @vdusek, closes #551 - Add equality check to UserData model (#562) (899a25c) by @janbuchar
0.3.7 (2024-09-25)โ
๐ Bug Fixesโ
- Improve
Request.user_data
serialization (#540) (de29c0e) by @janbuchar, closes #524 - Adopt new version of curl-cffi (#543) (f6fcf48) by @vdusek
0.3.6 (2024-09-19)โ
๐ Featuresโ
- Add HTTP/2 support for HTTPX client (#513) (0eb0a33) by @vdusek, closes #512
- Expose extended unique key when creating a new Request (#515) (1807f41) by @vdusek
- Add header generator and integrate it into HTTPX client (#530) (b63f9f9) by @vdusek, closes #402
๐ Bug Fixesโ
0.3.5 (2024-09-10)โ
๐ Featuresโ
- Memory usage limit configuration via environment variables (#502) (c62e554) by @janbuchar
๐ Bug Fixesโ
- Http clients detect 4xx as errors by default (#498) (1895dca) by @vdusek, closes #496
- Correctly handle log level configuration (#508) (7ea8fe6) by @janbuchar
0.3.4 (2024-09-05)โ
๐ Bug Fixesโ
0.3.3 (2024-09-05)โ
๐ Bug Fixesโ
- Deduplicate requests by unique key before submitting them to the queue (#499) (6a3e0e7) by @janbuchar
0.3.2 (2024-09-02)โ
๐ Bug Fixesโ
- Double incrementation of
item_count
(#443) (cd9adf1) by @cadlagtrader, closes #442 - Field alias in
BatchRequestsOperationResponse
(#485) (126a862) by @janbuchar - JSON handling with Parsel (#490) (ebf5755) by @janbuchar, closes #488
0.3.1 (2024-08-30)โ
๐ Featuresโ
0.3.0 (2024-08-27)โ
๐ Featuresโ
- Implement ParselCrawler that adds support for Parsel (#348) (a3832e5) by @asymness, closes #335
- Add support for filling a web form (#453) (5a125b4) by @vdusek, closes #305