fix: add support non-GET requests for `PlaywrightCrawler` by Mantisus · Pull Request #1208 · apify/crawlee-python
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds support for issuing non-GET HTTP requests (including custom methods, headers, and payloads) in the PlaywrightCrawler by leveraging Playwright’s routing API, and updates the existing unit test to cover GET, POST, and POST-with-payload scenarios.
- Introduces a private
_prepare_request_handlerto configure method, headers, and payload viaroute.continue_. - Emits a warning for performance implications when using non-GET methods.
- Refactors
test_basic_requestto parametrize HTTP methods and payloads.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| tests/unit/crawlers/_playwright/test_playwright_crawler.py | Refactored test_basic_request into a parametrized test covering GET, POST, and POST-with-payload. |
| src/crawlee/crawlers/_playwright/_playwright_crawler.py | Added _prepare_request_handler, warning logic in _navigate, and route registration for custom requests. |
Comments suppressed due to low confidence (3)
tests/unit/crawlers/_playwright/test_playwright_crawler.py:51
- Consider adding a test case that passes custom headers (e.g.
headers={'X-Test': 'value'}) to verify non-GET requests include expected HTTP headers.
async def test_basic_request(method: HttpMethod, path: str, payload: HttpPayload, server_url: URL) -> None:
src/crawlee/crawlers/_playwright/_playwright_crawler.py:230
- Passing raw bytes as
post_datamay cause a type error in Playwright; consider decoding payload tostr(e.g.payload.decode()) or otherwise converting it to the format expected byroute.continue_.
await route.continue_(method=method, headers=dict(headers) if headers else None, post_data=payload)
src/crawlee/crawlers/_playwright/_playwright_crawler.py:230
- Using
dict(headers)may not correctly serialize theHttpHeadersmodel; consider callingheaders.model_dump()or.dict()to produce a properdict[str, str].
await route.continue_(method=method, headers=dict(headers) if headers else None, post_data=payload)