The enqueue_links function misinterprets the limit keyword argument
Minimal reproducible code:
import asyncio from urllib.parse import quote_plus from crawlee import Request from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext async def main() -> None: crawler = BeautifulSoupCrawler() @crawler.router.default_handler async def handle_listing(context: BeautifulSoupCrawlingContext) -> None: await context.enqueue_links( selector="h4 a", label="DETAIL", limit=1 ) @crawler.router.handler("DETAIL") async def handle_detail(context: BeautifulSoupCrawlingContext) -> None: print("Detail page URL:", context.request.url) await crawler.run(["https://honzajavorek.cz/blog/"]) if __name__ == "__main__": asyncio.run(main())
Run like this:
uv run --with crawlee[beautifulsoup] python crawlee_bug.py
Expected output:
Detail page URL: https://honzajavorek.cz/blog/tydenni-poznamky-vanoce-a-tak/
Actual output:
Detail page URL: https://honzajavorek.cz/blog/tydenni-poznamky-vanoce-a-tak/
Detail page URL: https://honzajavorek.cz/blog/tydenni-poznamky-vylepsovani-seznamu-kandidatu-nove-bydleni-a-odpocinek-v-mlze/
If I change the limit to 5, I get 6 links on the output.
I tried to nail down the problem, but I spent 20 minutes drowning in the sea of type definitions, kwargs passed down as is, and nested _create_... functions. It was impossible to me to follow the flow of the keyword arguments and find the place where the limit gets actually applied.