feat: add support `use_incognito_pages` for `browser_launch_options` in `PlaywrightCrawler` by Mantisus · Pull Request #941 · apify/crawlee-python

@Mantisus

Description

  • Improve cookie handling for PlaywrightCrawler. Cookies are now stored in the Session and set in Playwright Context from the Session.
  • Add use_incognito_pages option for browser_launch_options allowing each new page to be launched in a separate context.

Issues

@Mantisus

@Mantisus

@janbuchar

Does the incognito pages option have any relationship with the cookie handling? Also, a test would be nice 🙂

@Mantisus

Does the incognito pages option have any relationship with the cookie handling?

Yes. When we work in basic mode we are working with one common context. In this case, cookies will be strayed between sessions.

However, by using incognito pages one page per context, we get a more controlled situation. When a session works only with the cookies intended for it.

@Mantisus

@janbuchar

Does the incognito pages option have any relationship with the cookie handling?

Yes. When we work in basic mode we are working with one common context. In this case, cookies will be strayed between sessions.

Can this be a desirable state for anyone?

However, by using incognito pages one page per context, we get a more controlled situation. When a session works only with the cookies intended for it.

I sort of think that this should be the default. Do we really need to make it configurable? What does the JS version do?

@B4nan

What does the JS version do?

It's not the default, because it means no browser cache, so a huge perf cost, when we were testing this, things took literally twice the time to finish because of that.

@janbuchar

What does the JS version do?

It's not the default, because it means no browser cache, so a huge perf cost, when we were testing this, things took literally twice the time to finish because of that.

So does the JS PlaywrightCrawler also not store cookies in the Session?

@Mantisus

So does the JS PlaywrightCrawler also not store cookies in the Session?

Stores, but when using the basic setup, they are just as flowing between sessions from the Playwright context.

@B4nan

This is the thing we really need to redesign in next major. IDK how cookies behave, but the more important part there is that because of this, we keep the same proxy in one browser instance with the defaults (so persistent contexts). In other words, sessions rotate per request, but proxies only per browser (we have some limits on how many times a browser instance can be used).

@Mantisus

vdusek

Co-authored-by: Vlada Dusek <v.dusek96@gmail.com>

@Mantisus

@Mantisus

vdusek

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last nit, otherwise LGTM, but let's wait for @janbuchar's approve as well.

@Mantisus

janbuchar

@Mantisus

janbuchar