feat: Dynamic memory snapshots by Pijukatel · Pull Request #1715 · apify/crawlee-python

Pull request overview

This PR introduces dynamic memory snapshot support to address autoscaling limitations in environments with variable memory allocations (e.g., Kubernetes burstable QoS). It adds a Ratio type that allows the autoscaler to dynamically query available system memory rather than being locked to an initial baseline.

Changes:

  • Introduced Ratio type for representing dynamic memory as a proportion of total system memory
  • Modified Snapshotter and MemorySnapshot to accept either ByteSize (fixed) or Ratio (dynamic) for memory limits
  • Added logic to dynamically evaluate memory overload based on current available memory when using Ratio

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/crawlee/_utils/byte_size.py Adds Ratio Pydantic model with validation for memory ratios (0.0 < value ≤ 1.0)
src/crawlee/_autoscaling/snapshotter.py Updates max_memory_size parameter to accept ByteSize | Ratio and dynamically calculates memory limits when using Ratio
src/crawlee/_autoscaling/_types.py Modifies MemorySnapshot.is_overloaded to dynamically query system memory when max_memory_size is a Ratio
tests/unit/_autoscaling/test_snapshotter.py Adds comprehensive test simulating memory scale-up/scale-down scenarios with mocked memory info

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.