Scrape API (pre-rendering, server-side rendering, screenshots, PDFs, and scraping) | Headless-Render-API

This document describes the Scrape functionality/endpoint, its associated HTTP header options, and possible error responses.

Endpoint: service.headless-render-api.com

Scrape Path: /scrape/$URL

Example URL: 

https://service.headless-render-api.com/scrape/https://example.com/

The target $URL is obviously part of the path, but do not re-encode or re-escape it beyond what you'd enter into a browser URL field. Both UTF-8 encoding and percent-encoding are acceptable.

Related links:

Auth

Send your secret API token (you'll get it after creating an account) as a header with all your requests to avoid rate limiting.

  • curl --header "X-Prerender-Token: secret-token"

Webpage Scrape API

A scraping API. Try it in your browser: service.headless-render-api.com/scrape/https://example.com

cURL

curl https://service.headless-render-api.com/scrape/https://example.com/ > out.html

Node.js client

// npm install prerendercloud
const { body, meta, screenshot, statusCode, headers } = await prerendercloud.scrape("https://example.com/", {
  withMetadata: true,
  withScreenshot: true,
});
  • Waits longer than normal for a page to finish rendering. Similar to Puppeteer's { waitUntil: 'networkidle' }. Useful for pages that depend on AJAX/XHR that fire late or IPFS hosted pages

  • curl --header "Prerender-Wait-Extra-Long: true"
  • By default, headless-render-api.com will wait for all ws activity to finish, but it doesn't make sense to "wait" for them to finish if they never stop. An example: real time prices on a stock price website.

  • curl --header "prerender-dont-wait-for-web-sockets: true"
  • By default, headless-render-api.com sends cookies back to the server. Use this to block them.

  • curl --header "prerender-block-cookies: true"
  • By default, if your origin server returns 301/302, headless-render-api.com will just return that outright - which is appropriate for the common use case of proxying traffic through headless-render-api.com. If using the API in a crawling/scraping/batching you may want to follow redirects.

  • curl --header "Prerender-Follow-Redirects: true"
  • Overrides device screen width (default: 1366)

  • curl --header "prerender-device-width: 800"
  • Overrides device screen height (default: 768)

  • curl --header "prerender-device-height: 600"
  • Whether to emulate mobile device (default: false).

    This includes viewport meta tag, overlay scrollbars, text autosizing and more. In other words, whether the meta viewport tag is taken into account.

    an example of "viewport meta tag":

  • curl --header "prerender-device-is-mobile: true"
  • Emulates the given media type or media feature for CSS media queries

    default: screen

    Possible values are: screen, print, braille, embossed, handheld, projection, speech, tty, tv

  • curl --header "Prerender-Emulated-Media: screen"
  • Changes the response from text/html to application/json and returns an object 2 fields: {body, screenshot}

    The values are base64 encoded:

    • body is the normal pre-rendered response
    • screenshot is a base64 encoded PNG

    This is useful for savings screenshots or injecting them as open graph images. Often used with Prerender-With-Metadata

  • curl --header "Prerender-With-Screenshot: true"
  • Changes the response from text/html to application/json and returns an object 3 fields: {body, meta, links}

    The values are base64 encoded:

    • body is the normal pre-rendered response
    • meta is an object that includes  { title, metaDescription, h1, ogImage, ogTitle, twitterCard }
    • links is an array of URLs/paths on the page

    This is useful for capturing SEO-relevant metadata during the pre-render.

  • curl --header "Prerender-With-Metadata: true"

HTTP Error Codes

  • Invalid Request
  • There will be an error message in the HTML, fix your request and retry
  • Example causes:

    • malformed URL
    • a localhost URL/IP
    • or a page responds with content-type: application/octet-stream
  • Rate limited
  • Requests made without API tokens (or expired/missing billing information) will get this - see

    pricing

  • General Error
  • Example causes

    :

    • 10s (timeout) while waiting for a page to finish rendering (waits until all in-flight requests finish, load event, domContent event etc.)
    • or HTTPS Page is making HTTP (non secure) requests
    • ...or a random bug on our end
  • The error will show up in the headless-render-api.com web UI (after you sign in)
  • Bad Gateway (your origin returned 5xx)
  • It means we received a 5xx when trying to visit your page
  • Or it means we received a 403 (forbidden). Typically this means your page is behind a login wall or a firewall (like Cloudflare) is blocking the headless-render-api user-agent (make an exception to allow user-agents matching /prerendercloud/)
  • Probably not retryable. It depends on the site you're requesting. If it's your site, make sure it's up and running correctly
  • Over capacity (Rare)
  • Retry the request with some backoff
  • We'll see the error and our autoscaler will increase capacity within 5 minutes, but you should email us anyway: support@headless-render-api.com
  • Gateway Timeout (Rare)
  • Retry the request with some backoff
  • This is unexpected and should not happen. We'll see it, but you should email us anyway: support@headless-render-api.com