fix: Freeze core `Request` fields by Mantisus · Pull Request #1603 · apify/crawlee-python

@Mantisus

Description

  • Ensures that core Request fields such as unique_key, method, and others will not be changed.

@Mantisus

vdusek

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree in principle, but I'm not entirely sure we can do it. We should definitely test this change on the platform - run integration and E2E tests in the SDK.

@janbuchar

I agree in principle, but I'm not entirely sure we can do it. We should definitely test this change on the platform - run integration and E2E tests in the SDK.

I completely agree with this. I don't see a good reason why it shouldn't work, but better safe than sorry.

Once this is ready, no need to wait for my review unless you decide to change the PR substantially.

@Pijukatel

@vdusek

@vdusek

Also do we want to freeze only these 4 fields?

List of all from docs:

image

What about label, max_retries, no_retry, enqueue_strategy, forefront?

@Mantisus

What about label, max_retries, no_retry, enqueue_strategy, forefront?

no_retry - we directly change it in __run_task_function in some situations.
label - I can think of some use cases where you might want to change it. For example, in error_handler.
enqueue_strategy - has a setter, so I think it was also considered a mutable attribute.
forefront - same thing.
max_retries - same thing.

I'm not sure that protecting these attributes is as critical as, for example, unique_key. But now is a good time to decide.

Edit: they're failing

I'll check why it's falling.

Edit: Now the tests are passed without falling

@janbuchar

no_retry - we directly change it in __run_task_function in some situations.

makes sense to change it in user handler as well - e.g., you detect a weird error and want to tell crawlee not to retry the request anymore

label - I can think of some use cases where you might want to change it. For example, in error_handler.

yeah, I guess that it could make sense to change it to "redirect" the request to a different branch of the router 🤷

enqueue_strategy - has a setter, so I think it was also considered a mutable attribute.

This actually doesn't make much sense - it's there for checking if the URL still matches the enqueue_strategy after a redirect

forefront - same thing.

You may want to change this when re-enqueueing a request.

max_retries - same thing.

This is debatable, maybe it's enough to allow setting it when creating the request?

@Mantisus

@Mantisus

it's there for checking if the URL still matches the enqueue_strategy after a redirect

This setter is required for fix #1606.

Since EnqueueLinksFunction can accept a requests parameter with a list of Request objects, this setter is required to set the appropriate strategy for these requests so that the check works correctly after redirection.

vdusek

@vdusek vdusek left a comment

Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it again (apify/apify-sdk-python#702) with the current state, and it passes.

LGTM and thanks for thinking through all the options.

Pijukatel