fix: Improve error handling for `RobotsTxtFile.load` by Mantisus · Pull Request #1524 · apify/crawlee-python

Skip to content

Navigation Menu

Sign in

Appearance settings

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Appearance settings

Conversation

@Mantisus

Copy link

Collaborator

Description

  • This PR adds error handling for RobotsTxtFile.load. This prevents crawler failures related to network errors, DNS errors for non-existent domains (e.g., https://placeholder.com/), or unexpected data formats received from the /robots.txt page (e.g., https://avatars.githubusercontent.com/robots.txt).

@Mantisus Mantisus requested review from janbuchar and vdusek

October 30, 2025 17:14

@Mantisus Mantisus self-assigned this

Oct 30, 2025
Copy link

Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we cover this fix by a test? Otherwise LGTM.

Mantisus reacted with thumbs up emoji

@janbuchar janbuchar removed their request for review

October 31, 2025 13:15

@vdusek vdusek merged commit 596a311 into apify:master

Nov 3, 2025

19 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@vdusek vdusek vdusek approved these changes

Assignees

@Mantisus Mantisus

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants

@Mantisus @vdusek @apify-service-account