fix: Improve error handling for `RobotsTxtFile.load` by Mantisus · Pull Request #1524 · apify/crawlee-python
Navigation Menu
{{ message }}
apify / crawlee-python Public
- Notifications You must be signed in to change notification settings
- Fork 705
Merged
fix: Improve error handling for RobotsTxtFile.load#1524
vdusek merged 2 commits intoapify:masterfrom
fix: Improve error handling for RobotsTxtFile.load#1524
vdusek merged 2 commits intoapify:masterfrom
Conversation
Copy link
Collaborator
Mantisus
commented
Oct 30, 2025
Mantisus
commented
Description
- This PR adds error handling for
RobotsTxtFile.load. This prevents crawler failures related to network errors, DNS errors for non-existent domains (e.g.,https://placeholder.com/), or unexpected data formats received from the /robots.txt page (e.g., https://avatars.githubusercontent.com/robots.txt).
Mantisus
requested review from
janbuchar and
vdusek
Mantisus
self-assigned this
vdusek approved these changes Oct 31, 2025
Copy link
Collaborator
vdusek
left a comment
vdusek
left a comment
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we cover this fix by a test? Otherwise LGTM.
janbuchar
removed their request for review
Mantisus
mentioned this pull request
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment