feat: add Session binding capability via `session_id` in `Request` by Mantisus · Pull Request #1086 · apify/crawlee-python

I like the idea, thank you for including me!

Few concerns / ideas:

Requests are split between RQ and SessionPool

I am a bit wary of this decentralized state - the request is now effectively split between the RequestQueue (URL, headers, body) and SessionPool (Cookie header specifically). Granted, this divide was there before, but users couldn't rely on this, so the cookies couldn't have been considered required for making the request. Not sure what the better solution would be, though.

Better DX

Do we support passing session ID to requests added by enqueue_links (or other Crawlee-native methods)?

Ideally, I'd like to do something like this:

request_handler(context):
    ... 
    # I'm logged in as user A in the current request
    context.enqueue_links(session_id=context.session_id) # The crawler will visit all the child links as user A

Unstable proxy?

Maybe I'm thinking about this too much, but some proxy errors can cause a session to get retired (as ProxyError is a descendant of SessionError). Would one proxy hiccup (Apify proxies are afaik quite flaky) cause all the requests bound to the same session to fail? I do agree with @Mantisus 's reasoning (fail request on a missing session), but it still sounds like a very strict behavior (maybe that's what the users want, really).

I'm sorry to provide a fragmentary review like this, I'm sure you Python guys have thought of everything else :)