fix: Improve `Request.user_data` serialization by janbuchar ยท Pull Request #540 ยท apify/crawlee-python
label
Sep 23, 2024Comment on lines +168 to +176
| async def test_invalid_user_data_serialization() -> None: | ||
| with pytest.raises(ValidationError): | ||
| Request.from_url( | ||
| 'https://crawlee.dev', | ||
| user_data={ | ||
| 'foo': datetime(year=2020, month=7, day=4, tzinfo=timezone.utc), | ||
| 'bar': {datetime(year=2020, month=4, day=7, tzinfo=timezone.utc)}, | ||
| }, | ||
| ) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure where to put this one, but it is similar to the ones that follow...
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can move it to a separate file test_request.py.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really good job, thanks.
I just wonder whether we should not separate "user data" and "our Crawlee metadata". Currently, everything is stored in the user_data field. Let's say the Request would have 2 fields - user_data and crawlee_data / crawlee_metadata. Or is there any problem/drawback with this separation?
Comment on lines +168 to +176
| async def test_invalid_user_data_serialization() -> None: | ||
| with pytest.raises(ValidationError): | ||
| Request.from_url( | ||
| 'https://crawlee.dev', | ||
| user_data={ | ||
| 'foo': datetime(year=2020, month=7, day=4, tzinfo=timezone.utc), | ||
| 'bar': {datetime(year=2020, month=4, day=7, tzinfo=timezone.utc)}, | ||
| }, | ||
| ) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can move it to a separate file test_request.py.
I just wonder whether we should not separate "user data" and "our Crawlee metadata". Currently, everything is stored in the
user_datafield. Let's say theRequestwould have 2 fields -user_dataandcrawlee_data/crawlee_metadata. Or is there any problem/drawback with this separation?
Something something platform ๐ We need to persist this stuff when running on Apify, so it needs to be a part of user_data in the end. I guess we could separate the representation of "Crawlee Request" and "Payload for enqueue request API call" better, probably with/after #94.
Something something platform ๐ We need to persist this stuff when running on Apify, so it needs to be a part of user_data in the end. I guess we could separate the representation of "Crawlee Request" and "Payload for enqueue request API call" better, probably with/after #94.
Thanks. Let's merge it then.
Something something platform ๐
Correct, the request fields are validated, we cant just add something the platform doesn't expect.
janbuchar
deleted the
fix-user-data-serialization
branch
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters