feat: add periodic status logging and `status_message_callback` parameter for customization by Mantisus · Pull Request #1265 · apify/crawlee-python

Pull Request Overview

This PR introduces periodic status logging and the ability to customize status messages via a new callback parameter. Key changes include:

  • Adding the status_message_callback and status_message_logging_interval parameters to the BasicCrawler.
  • Emitting a new CRAWLER_STATUS event and associated data type.
  • Extending the event manager and updating logging configuration to support custom log levels.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

File Description
tests/unit/crawlers/_basic/test_basic_crawler.py Added tests for the new status message callback and event emission.
src/crawlee/events/_types.py, src/crawlee/events/_event_manager.py, src/crawlee/events/init.py Introduced and exposed the new EventCrawlerStatusData and related overloads.
src/crawlee/crawlers/_basic/_basic_crawler.py Integrated periodic status logging via a RecurringTask and configurable callback along with corresponding documentation updates.
src/crawlee/_log_config.py Added a helper function (string_to_log_level) and updated log level configuration.
Comments suppressed due to low confidence (2)

src/crawlee/crawlers/_basic/_basic_crawler.py:1590

  • The docstring for 'status_message_callback' suggests that the callback should call 'crawler.setStatusMessage()' explicitly, but in _crawler_state_task the callback is invoked without any further logging. Consider clarifying the documentation or adjusting the implementation to ensure consistent behavior.
        if self._status_message_callback:

tests/unit/crawlers/_basic/test_basic_crawler.py:1429

  • [nitpick] The test defines a synchronous 'status_callback', but if real-world usage may require asynchronous processing for the status callback, it might be beneficial to either update the callback's signature or document the expectation clearly.
    def status_callback(state: StatisticsState, previous_state: StatisticsState | None, message: str) -> None: