Mask Playwright's "headless" headers
Description
- When running headless browsers with Playwright, certain HTTP headers (like User-Agent and Sec-Ch-Ua) reveal the browser is operating in headless mode by including the substring "headless". This makes it easy for anti-scraping systems to detect and block automated browsers.
- To prevent detection, these headers should be replaced with more realistic, browser-like values that mimic typical, non-headless browser behavior.
- Example of header values for headless Chromium:
{
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"Accept-Encoding": "gzip, deflate, br, zstd",
"Host": "httpbin.org",
"Priority": "u=0, i",
"Sec-Ch-Ua": "\"Chromium\";v=\"128\", \"Not;A=Brand\";v=\"24\", \"HeadlessChrome\";v=\"128\"",
"Sec-Ch-Ua-Mobile": "?0",
"Sec-Ch-Ua-Platform": "\"Linux\"",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/128.0.6613.18 Safari/537.36",
"X-Amzn-Trace-Id": "Root=1-66d04117-141b301674c02e4e2136f1f1"
}Relevant links
- https://github.com/apify/fingerprint-suite/tree/master/packages/fingerprint-injector
- https://github.com/apify/crawlee/blob/v3.11.4/packages/browser-pool/src/abstract-classes/browser-plugin.ts#L20:L21
- https://github.com/apify/crawlee/blob/v3.11.4/packages/browser-pool/src/abstract-classes/browser-plugin.ts#L213