feat: Persist RequestList state by janbuchar · Pull Request #1274 · apify/crawlee-python

@janbuchar

@janbuchar

@janbuchar janbuchar added the t-tooling

Issues with this label are in the ownership of the tooling team.

label

Jun 27, 2025

Pijukatel

vdusek

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

vdusek

Comment on lines +22 to +24

next_index: Annotated[int, Field(alias='nextIndex')] = 0
next_unique_key: Annotated[str | None, Field(alias='nextUniqueKey')] = None
in_progress: Annotated[set[str], Field(alias='inProgress')] = set()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the camelCase aliases necessary? AFAIK I also did not use them in FS storage clients.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not. Sessions and Statistics (other instances of recoverable state) use them too. I have no strong opinion here, if you do, say the word and I'll remove them.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, so currently somewhere we use them, and somewhere we don't - up to you then.

@janbuchar

@janbuchar janbuchar marked this pull request as ready for review

July 21, 2025 18:35

@janbuchar

The request data snapshotting is pretty inefficient - it loads the whole thing into memory (same as the JS version) and stores it uncompressed in the key-value store (JS version uses gzip).

After quite some trial and error, I believe we can use Ostrich algorithm now and optimize if it proves necessary.

Pijukatel

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments and one more serious about the test.

vdusek

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@janbuchar

@janbuchar

Pijukatel