fix: JSON handling with Parsel by janbuchar · Pull Request #490 · apify/crawlee-python
- Shouldn't we handle this in BS crawler as well?
I'm not sure how to go about that. BS is super tolerant to malformed HTML:
>>> from bs4 import BeautifulSoup >>> BeautifulSoup('{"hello": "world"}') <html><body><p>{"hello": "world"}</p></body></html>
So... let me propose employing the Ostrich algorithm yet again?
- What about adding try-except block when parsing the response? (Same for BS crawler.)
try: parsel_selector = await asyncio.to_thread(lambda: Selector(body=context.http_response.read())) except ValueError as exc: raise ValueError( f'Failed to parse the response body. Ensure the response is parseble via Parsel: {exc}' ) from exc
Are you sure the original exception is not clear enough? This will make the traceback longer, which is not great for readability.