feat: add support `use_incognito_pages` for `browser_launch_options` in `PlaywrightCrawler` by Mantisus · Pull Request #941 · apify/crawlee-python
Description
- Improve cookie handling for
PlaywrightCrawler. Cookies are now stored in theSessionand set in Playwright Context from theSession. - Add
use_incognito_pagesoption forbrowser_launch_optionsallowing each new page to be launched in a separate context.
Issues
Does the incognito pages option have any relationship with the cookie handling? Also, a test would be nice 🙂
Does the incognito pages option have any relationship with the cookie handling?
Yes. When we work in basic mode we are working with one common context. In this case, cookies will be strayed between sessions.
However, by using incognito pages one page per context, we get a more controlled situation. When a session works only with the cookies intended for it.
Does the incognito pages option have any relationship with the cookie handling?
Yes. When we work in basic mode we are working with one common context. In this case, cookies will be strayed between sessions.
Can this be a desirable state for anyone?
However, by using
incognito pagesone page per context, we get a more controlled situation. When a session works only with the cookies intended for it.
I sort of think that this should be the default. Do we really need to make it configurable? What does the JS version do?
What does the JS version do?
It's not the default, because it means no browser cache, so a huge perf cost, when we were testing this, things took literally twice the time to finish because of that.
What does the JS version do?
It's not the default, because it means no browser cache, so a huge perf cost, when we were testing this, things took literally twice the time to finish because of that.
So does the JS PlaywrightCrawler also not store cookies in the Session?
So does the JS
PlaywrightCrawleralso not store cookies in theSession?
Stores, but when using the basic setup, they are just as flowing between sessions from the Playwright context.
This is the thing we really need to redesign in next major. IDK how cookies behave, but the more important part there is that because of this, we keep the same proxy in one browser instance with the defaults (so persistent contexts). In other words, sessions rotate per request, but proxies only per browser (we have some limits on how many times a browser instance can be used).
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last nit, otherwise LGTM, but let's wait for @janbuchar's approve as well.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters