fix: Freeze core `Request` fields by Mantisus · Pull Request #1603 · apify/crawlee-python
Description
- Ensures that core
Requestfields such asunique_key,method, and others will not be changed.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree in principle, but I'm not entirely sure we can do it. We should definitely test this change on the platform - run integration and E2E tests in the SDK.
I agree in principle, but I'm not entirely sure we can do it. We should definitely test this change on the platform - run integration and E2E tests in the SDK.
I completely agree with this. I don't see a good reason why it shouldn't work, but better safe than sorry.
Once this is ready, no need to wait for my review unless you decide to change the PR substantially.
What about label, max_retries, no_retry, enqueue_strategy, forefront?
no_retry - we directly change it in __run_task_function in some situations.
label - I can think of some use cases where you might want to change it. For example, in error_handler.
enqueue_strategy - has a setter, so I think it was also considered a mutable attribute.
forefront - same thing.
max_retries - same thing.
I'm not sure that protecting these attributes is as critical as, for example, unique_key. But now is a good time to decide.
Edit: they're failing
I'll check why it's falling.
Edit: Now the tests are passed without falling
no_retry- we directly change it in__run_task_functionin some situations.
makes sense to change it in user handler as well - e.g., you detect a weird error and want to tell crawlee not to retry the request anymore
label- I can think of some use cases where you might want to change it. For example, inerror_handler.
yeah, I guess that it could make sense to change it to "redirect" the request to a different branch of the router 🤷
enqueue_strategy- has a setter, so I think it was also considered a mutable attribute.
This actually doesn't make much sense - it's there for checking if the URL still matches the enqueue_strategy after a redirect
forefront- same thing.
You may want to change this when re-enqueueing a request.
max_retries- same thing.
This is debatable, maybe it's enough to allow setting it when creating the request?
it's there for checking if the URL still matches the enqueue_strategy after a redirect
This setter is required for fix #1606.
Since EnqueueLinksFunction can accept a requests parameter with a list of Request objects, this setter is required to set the appropriate strategy for these requests so that the check works correctly after redirection.
vdusek
left a comment
•
Loading
vdusek
left a comment
•
Loading
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it again (apify/apify-sdk-python#702) with the current state, and it passes.
LGTM and thanks for thinking through all the options.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
