ValueError("URL should be absolute") when crawling https://crawlee.dev/js/api/core/changelog and respecting robots.txt
Hello, I was trying my crawler in your webpage (specifically in https://crawlee.dev/js/api/core/changelog) and I encountered this error:
�[90m[crawlee.crawlers._basic._basic_crawler]�[0m �[33mWARN �[0m Retrying request to https://crawlee.dev/js/api/core/changelog due to: URL should be absoluteFile "python3.12/site-packages/yarl/_url.py", line 628, in _origin, raise ValueError("URL should be absolute")
This only happens when I set respect_robots_txt_file=True, I tried putting it to false and it doesn't fail. This is my crawler config in case it helps:
crawler = AdaptivePlaywrightCrawler.with_beautifulsoup_static_parser(
playwright_crawler_specific_kwargs={
"browser_type": "chromium",
"headless": True,
},
configure_logging=True,
use_session_pool=True,
request_handler_timeout=timedelta(seconds=120),
respect_robots_txt_file=True,
)
I am not planning to crawl your page ;) , I was using it just as an example but it looks like there is some error when checking robots.txt with a relative path maybe?