fix: JSON handling with Parsel by janbuchar · Pull Request #490 · apify/crawlee-python

  1. Shouldn't we handle this in BS crawler as well?

I'm not sure how to go about that. BS is super tolerant to malformed HTML:

>>> from bs4 import BeautifulSoup
>>> BeautifulSoup('{"hello": "world"}')
<html><body><p>{"hello": "world"}</p></body></html>

So... let me propose employing the Ostrich algorithm yet again?

  1. What about adding try-except block when parsing the response? (Same for BS crawler.)
try:
    parsel_selector = await asyncio.to_thread(lambda: Selector(body=context.http_response.read()))
except ValueError as exc:
    raise ValueError(
        f'Failed to parse the response body. Ensure the response is parseble via Parsel: {exc}'
    ) from exc

Are you sure the original exception is not clear enough? This will make the traceback longer, which is not great for readability.