Crawlee has empty response on some URLs
Hi,
I've got some very basic code trying to scrape a website. However, while crawlee seems to work fine for most of the website, some pages it just fails for
Fails:
https://www.mtggoldfish.com/deck/6610848
https://www.mtggoldfish.com/deck/6610848#paper
https://www.mtggoldfish.com/deck/6610847
Succeeds:
https://www.mtggoldfish.com/metagame/standard#paper
https://www.mtggoldfish.com/deck/download/6610848
I'm really not sure why this is happening.
@router.handler("deck")
async def deck_handler(context: BeautifulSoupCrawlingContext):
context.log.info(f"Deck handler: {context.request.url}")
deck_id = context.request.url.split("/")[-1].replace("#paper", "")
context.log.info(f"Soup: {context.soup}")
deck_information = context.soup.find(
"p", class_="deck-container-information"
)
context.log.info(f"Deck information: {deck_information}")
[crawlee.beautifulsoup_crawler._beautifulsoup_crawler] INFO Deck handler: https://www.mtggoldfish.com/deck/6610848#paper
[crawlee.beautifulsoup_crawler._beautifulsoup_crawler] INFO Soup:
[crawlee.beautifulsoup_crawler._beautifulsoup_crawler] INFO Deck information: None
Other pages all seem to work, so I'm unsure why it fails on.
Not really sure how to debug this, since it's not throwing any errors until that point.
Thanks.