fix: add support non-GET requests for `PlaywrightCrawler` by Mantisus · Pull Request #1208 · apify/crawlee-python

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds support for issuing non-GET HTTP requests (including custom methods, headers, and payloads) in the PlaywrightCrawler by leveraging Playwright’s routing API, and updates the existing unit test to cover GET, POST, and POST-with-payload scenarios.

  • Introduces a private _prepare_request_handler to configure method, headers, and payload via route.continue_.
  • Emits a warning for performance implications when using non-GET methods.
  • Refactors test_basic_request to parametrize HTTP methods and payloads.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
tests/unit/crawlers/_playwright/test_playwright_crawler.py Refactored test_basic_request into a parametrized test covering GET, POST, and POST-with-payload.
src/crawlee/crawlers/_playwright/_playwright_crawler.py Added _prepare_request_handler, warning logic in _navigate, and route registration for custom requests.
Comments suppressed due to low confidence (3)

tests/unit/crawlers/_playwright/test_playwright_crawler.py:51

  • Consider adding a test case that passes custom headers (e.g. headers={'X-Test': 'value'}) to verify non-GET requests include expected HTTP headers.
async def test_basic_request(method: HttpMethod, path: str, payload: HttpPayload, server_url: URL) -> None:

src/crawlee/crawlers/_playwright/_playwright_crawler.py:230

  • Passing raw bytes as post_data may cause a type error in Playwright; consider decoding payload to str (e.g. payload.decode()) or otherwise converting it to the format expected by route.continue_.
await route.continue_(method=method, headers=dict(headers) if headers else None, post_data=payload)

src/crawlee/crawlers/_playwright/_playwright_crawler.py:230

  • Using dict(headers) may not correctly serialize the HttpHeaders model; consider calling headers.model_dump() or .dict() to produce a proper dict[str, str].
await route.continue_(method=method, headers=dict(headers) if headers else None, post_data=payload)