refactor!: update status code handling by Mantisus · Pull Request #1028 · apify/crawlee-python
Navigation Menu
{{ message }}
apify / crawlee-python Public
- Notifications You must be signed in to change notification settings
- Fork 705
Merged
Pijukatel merged 2 commits intoapify:masterfrom
Feb 28, 2025Merged
refactor!: update status code handling#1028
Pijukatel merged 2 commits intoapify:masterfrom
refactor!: update status code handling#1028
Pijukatel merged 2 commits intoapify:masterfrom
Conversation
Copy link
Collaborator
Mantisus
commented
Feb 26, 2025
Mantisus
commented
Description
additional_http_error_status_codesandignore_http_error_status_codeshave been removed from the constructor parameters for HTTP clients- The method
_raise_for_error_status_codehas been removed fromHttpClientand its logic has been moved toBasicCrawler - Prioritized checking of status codes that indicate
Sessionblocking. Codes (401, 403, 429) triggerretirefor theSessionand retry, while other 4XX codes are handled as client errors (errors without retries). additional_http_error_status_codesis no longer used when checking status codes that indicateSessionblocking. According to current documentation, these codes should trigger a retry, notretirefor theSessionand retry. Blocking status codes can be modified inSessionPoolthroughcreate_session_settings.- Standardized error handling for status codes in both
PlaywrightCrawlerandHttpCrawler
Issues
- Closes: Refactor additional status codes arguments #998
- Closes: Implementing correct handling of 403 and the other codes that should trigger a SessionError #830
Testing
- Tests for client error status codes now use code 402.
- A separate test has been added for 403 since in this case the number of retries is affected by the 'max_session_rotations' parameter.
Mantisus
requested review from
Pijukatel and
vdusek
Mantisus
self-assigned this
Pijukatel approved these changes Feb 27, 2025
Copy link
Collaborator
Pijukatel
left a comment
Pijukatel
left a comment
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. One tiny optional comment. Approved anyway.
tests/unit/crawlers/_http/test_http_crawler.py
Outdated
Show resolved
Hide resolved
tests/unit/crawlers/_http/test_http_crawler.py Outdated Show resolved Hide resolved
Mantisus
requested review from
janbuchar
and removed request for
vdusek
janbuchar approved these changes Feb 28, 2025
Copy link
Collaborator
janbuchar
left a comment
janbuchar
left a comment
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nice! Thank you
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment