feat: Persist `DefaultRenderingTypePredictor` state by Mantisus · Pull Request #1340 · apify/crawlee-python
Pull Request Overview
This PR adds persistence capabilities to the DefaultRenderingTypePredictor by implementing state management that saves and restores the trained model and associated data to/from a key-value store. This allows the predictor to maintain its learned patterns across different runs.
- Adds persistence support with configurable key-value storage integration
- Implements async context manager pattern for proper resource management
- Introduces state serialization/deserialization for scikit-learn models
Reviewed Changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/crawlee/crawlers/_adaptive_playwright/_rendering_type_predictor.py |
Core implementation of persistence with RecoverableState integration and async context manager |
src/crawlee/crawlers/_adaptive_playwright/_utils.py |
Utility functions for scikit-learn model serialization and validation |
src/crawlee/crawlers/_adaptive_playwright/_adaptive_playwright_crawler.py |
Integration of predictor into crawler's context managers |
tests/unit/crawlers/_adaptive_playwright/test_predictor.py |
Updated tests to use async context manager and added persistence tests |
tests/unit/crawlers/_adaptive_playwright/test_adaptive_playwright_crawler.py |
Added super().init() call to test mock class |
docs/guides/code_examples/playwright_crawler_adaptive/init_prediction.py |
Updated example to properly call parent constructor |
Comments suppressed due to low confidence (1)
tests/unit/crawlers/_adaptive_playwright/test_predictor.py:27
- The function name 'ictor_same_label' appears to be truncated or misspelled. It should likely be 'test_predictor_same_label' or similar.
async def ictor_same_label(url: str, expected_prediction: RenderingType, label: str | None) -> None: