Scrape API (pre-rendering, server-side rendering, screenshots, PDFs, and scraping) | Headless-Render-API
This document describes the Scrape functionality/endpoint, its associated HTTP header options, and possible error responses.
Endpoint: service.headless-render-api.com
Scrape Path: /scrape/$URL
Example URL:
https://service.headless-render-api.com/scrape/https://example.com/The target $URL is obviously part of the path, but do not re-encode or re-escape it beyond what you'd enter into a browser URL field. Both UTF-8 encoding and percent-encoding are acceptable.
Related links:
- examples of using the Scrape API
- clients
- Node.js client
- /.well-known/open-api.yaml for OpenAPI/Swagger
- /.well-known/ai-plugin.json for LLMs like ChatGPT
Auth
Send your secret API token (you'll get it after creating an account) as a header with all your requests to avoid rate limiting.
curl --header "X-Prerender-Token: secret-token"
Webpage Scrape API
A scraping API. Try it in your browser: service.headless-render-api.com/scrape/https://example.com
cURL
curl https://service.headless-render-api.com/scrape/https://example.com/ > out.htmlNode.js client
// npm install prerendercloud
const { body, meta, screenshot, statusCode, headers } = await prerendercloud.scrape("https://example.com/", {
withMetadata: true,
withScreenshot: true,
});Waits longer than normal for a page to finish rendering. Similar to Puppeteer's
{ waitUntil: 'networkidle' }. Useful for pages that depend on AJAX/XHR that fire late or IPFS hosted pagescurl --header "Prerender-Wait-Extra-Long: true"
By default, headless-render-api.com will wait for all ws activity to finish, but it doesn't make sense to "wait" for them to finish if they never stop. An example: real time prices on a stock price website.
curl --header "prerender-dont-wait-for-web-sockets: true"
By default, headless-render-api.com sends cookies back to the server. Use this to block them.
curl --header "prerender-block-cookies: true"
By default, if your origin server returns 301/302, headless-render-api.com will just return that outright - which is appropriate for the common use case of proxying traffic through headless-render-api.com. If using the API in a crawling/scraping/batching you may want to follow redirects.
curl --header "Prerender-Follow-Redirects: true"
Overrides device screen width (default: 1366)
curl --header "prerender-device-width: 800"
Overrides device screen height (default: 768)
curl --header "prerender-device-height: 600"
Whether to emulate mobile device (default: false).
This includes viewport meta tag, overlay scrollbars, text autosizing and more. In other words, whether the meta viewport tag is taken into account.
an example of "viewport meta tag":
curl --header "prerender-device-is-mobile: true"
Emulates the given media type or media feature for CSS media queries
default: screen
Possible values are: screen, print, braille, embossed, handheld, projection, speech, tty, tv
curl --header "Prerender-Emulated-Media: screen"
Changes the response from text/html to application/json and returns an object 2 fields: {body, screenshot}
The values are base64 encoded:
- body is the normal pre-rendered response
- screenshot is a base64 encoded PNG
This is useful for savings screenshots or injecting them as open graph images. Often used with
Prerender-With-Metadatacurl --header "Prerender-With-Screenshot: true"
Changes the response from text/html to application/json and returns an object 3 fields: {body, meta, links}
The values are base64 encoded:
- body is the normal pre-rendered response
- meta is an object that includes { title, metaDescription, h1, ogImage, ogTitle, twitterCard }
- links is an array of URLs/paths on the page
This is useful for capturing SEO-relevant metadata during the pre-render.
curl --header "Prerender-With-Metadata: true"
HTTP Error Codes
- Invalid Request
- There will be an error message in the HTML, fix your request and retry
Example causes:
- malformed URL
- a localhost URL/IP
- or a page responds with
content-type: application/octet-stream
- Rate limited
Requests made without API tokens (or expired/missing billing information) will get this - see
- General Error
Example causes
:
- 10s (timeout) while waiting for a page to finish rendering (waits until all in-flight requests finish, load event, domContent event etc.)
- or HTTPS Page is making HTTP (non secure) requests
- ...or a random bug on our end
- The error will show up in the headless-render-api.com web UI (after you sign in)
- Bad Gateway (your origin returned 5xx)
- It means we received a 5xx when trying to visit your page
- Or it means we received a 403 (forbidden). Typically this means your page is behind a login wall or a firewall (like Cloudflare) is blocking the headless-render-api user-agent (make an exception to allow user-agents matching /prerendercloud/)
- Probably not retryable. It depends on the site you're requesting. If it's your site, make sure it's up and running correctly
- Over capacity (Rare)
- Retry the request with some backoff
- We'll see the error and our autoscaler will increase capacity within 5 minutes, but you should email us anyway: support@headless-render-api.com
- Gateway Timeout (Rare)
- Retry the request with some backoff
- This is unexpected and should not happen. We'll see it, but you should email us anyway: support@headless-render-api.com