feat(portal): optionally terminate tls in prod by jamilbk · Pull Request #12689 · firezone/firezone

Summary

We're migrating from Azure L7 load balancers to L4 (TCP passthrough) load balancers. L4 load balancers don't terminate TLS, so the Phoenix application needs to handle TLS termination itself.

This PR introduces Portal.CertCache, a GenServer that caches parsed TLS certificates in memory, and wires it into Bandit's sni_fun callback so certificates are served on every TLS handshake without touching disk.

How it works

  • Portal.CertCache is a GenServer that accepts a fetch_fn (0-arity function returning PEM data), parses PEM into DER format on init, and holds the parsed cert chain + key in process state.
  • Two named instances are started — Portal.CertCache.Web and Portal.CertCache.Api — since the web and API endpoints serve different TLDs with separate certificates. Each instance gets a unique supervisor child ID via child_spec/1.
  • Bandit's sni_fun callback calls Portal.CertCache.get_opts/1 on each TLS handshake to retrieve the cached DER certs. Each endpoint points its sni_fun at its own CertCache instance. The sni_fun is passed through thousand_island_options: [transport_options: [sni_fun: ...]] since Bandit delegates SSL options to Thousand Island's transport layer.
  • Enablement is driven by port env vars — setting PHOENIX_HTTPS_WEB_PORT enables HTTPS on the web endpoint, PHOENIX_HTTPS_API_PORT on the API endpoint. When unset, endpoints continue to serve HTTP only. Both HTTP and HTTPS can run simultaneously (useful for health checks).
  • CertCache.refresh/1 allows updating certs at runtime without restart. On refresh failure, the stale cert is retained and a warning is logged. On init failure, the GenServer crashes (no stale cert to fall back on), which prevents the endpoint from accepting traffic until certs are available.
  • PortalOps stays HTTP-only — it's internal and doesn't need TLS.

Dev environment

In dev, both CertCache instances read from the existing self-signed certs at priv/cert/. The PortalWeb endpoint's config was updated from static certfile/keyfile to sni_fun, so dev now exercises the same code path as production.

Future work

  • Azure Key Vault integration (next PR): Replace the file-based fetch_fn with an API client that fetches certs from Azure Key Vault using Managed Identity, with periodic refresh.
  • Cert rotation: The refresh/1 API is already in place — the Key Vault PR will add a timer to periodically re-fetch and call it.

Test plan

  • Portal.CertCacheTest — 7 tests covering PEM parsing, init, init failure, refresh, and refresh failure
  • PortalWeb.EndpointTest — starts Bandit HTTPS with sni_fun → CertCache, connects via :ssl, verifies returned cert DER matches
  • PortalAPI.EndpointTest — same as above with a separate cert (CN=api-test), verifying per-endpoint cert isolation
  • Full test suite passes (2929 tests, 0 failures)
  • Dialyzer passes (0 errors)
  • Manual: mix phx.server → verify https://localhost:13443 works via sni_fun
  • Deploy with PHOENIX_HTTPS_WEB_PORT and PHOENIX_HTTPS_API_PORT set, verify TLS termination

🤖 Generated with Claude Code