feat: add periodic status logging and `status_message_callback` parameter for customization by Mantisus · Pull Request #1265 · apify/crawlee-python
Pull Request Overview
This PR introduces periodic status logging and the ability to customize status messages via a new callback parameter. Key changes include:
- Adding the status_message_callback and status_message_logging_interval parameters to the BasicCrawler.
- Emitting a new CRAWLER_STATUS event and associated data type.
- Extending the event manager and updating logging configuration to support custom log levels.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| tests/unit/crawlers/_basic/test_basic_crawler.py | Added tests for the new status message callback and event emission. |
| src/crawlee/events/_types.py, src/crawlee/events/_event_manager.py, src/crawlee/events/init.py | Introduced and exposed the new EventCrawlerStatusData and related overloads. |
| src/crawlee/crawlers/_basic/_basic_crawler.py | Integrated periodic status logging via a RecurringTask and configurable callback along with corresponding documentation updates. |
| src/crawlee/_log_config.py | Added a helper function (string_to_log_level) and updated log level configuration. |
Comments suppressed due to low confidence (2)
src/crawlee/crawlers/_basic/_basic_crawler.py:1590
- The docstring for 'status_message_callback' suggests that the callback should call 'crawler.setStatusMessage()' explicitly, but in _crawler_state_task the callback is invoked without any further logging. Consider clarifying the documentation or adjusting the implementation to ensure consistent behavior.
if self._status_message_callback:
tests/unit/crawlers/_basic/test_basic_crawler.py:1429
- [nitpick] The test defines a synchronous 'status_callback', but if real-world usage may require asynchronous processing for the status callback, it might be beneficial to either update the callback's signature or document the expectation clearly.
def status_callback(state: StatisticsState, previous_state: StatisticsState | None, message: str) -> None: