feat: Persist RequestList state by janbuchar · Pull Request #1274 · apify/crawlee-python
label
Jun 27, 2025Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
Comment on lines +22 to +24
| next_index: Annotated[int, Field(alias='nextIndex')] = 0 | ||
| next_unique_key: Annotated[str | None, Field(alias='nextUniqueKey')] = None | ||
| in_progress: Annotated[set[str], Field(alias='inProgress')] = set() |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the camelCase aliases necessary? AFAIK I also did not use them in FS storage clients.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not. Sessions and Statistics (other instances of recoverable state) use them too. I have no strong opinion here, if you do, say the word and I'll remove them.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, so currently somewhere we use them, and somewhere we don't - up to you then.
janbuchar
marked this pull request as ready for review
The request data snapshotting is pretty inefficient - it loads the whole thing into memory (same as the JS version) and stores it uncompressed in the key-value store (JS version uses gzip).
After quite some trial and error, I believe we can use Ostrich algorithm now and optimize if it proves necessary.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small comments and one more serious about the test.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters