ROX-33021: Add better DX to develop plugin against sensor-proxy by dvail · Pull Request #19463

ROX-33021: Add better DX to develop plugin against sensor-proxy by dvail · Pull Request #19463 · stackrox/stackrox

Description

Adds a script that exposes sensor-proxy via LoadBalancer, a NetworkPolicy to allow external traffic, and a CronJob to auto-cleanup these resources after a short period of time. This script is called automatically when starting a local OpenShift console container for development against the console plugin.

Recommended to view README changes in split view

Why do we want this?

This allows plugin developers to start the console in dev mode and connect to an OpenShift cluster via the production network path, instead of directly to a publicly exposed central with a hard coded API token.

Benefits:

Develop against the true e2e flow used in production
Requests are authorized and scoped appropriately via sensor-proxy (using the API token results in incorrect data when simulating a plugin response)
Reduces to a single start-ocp-console.sh script, without the need for additional env vars and configuration

What are the security implications, why do we auto-delete?

Auto deletion is included primarily because this is a dev-specific workflow and these are dev-specific resources that will not be intentionally run against production environments.

Counterpoints to potential concerns over making this service accessible externally:

The sensor-proxy service exists only as a proxy to central, and in addition, only allows access to a subset of APIs on a public central service. It does not expose information that is not available via central directly.
The service authorizes requests solely based on in-cluster OpenShift auth. Any request without an access token valid for the target cluster will be rejected immediately.

Additions that we could implement if desired, but I do not feel strongly about:

Restrict the NetworkPolicy to only allow requests from the users public IP address
Check for a variety of common kubectxs (staging, etc.) and bail out early

Production request flow

sequenceDiagram
    participant B as Authenticated user browser

    box rgba(0, 128, 255, 0.15) OpenShift cluster
        participant C as openshift-console backend
        participant P as sensor-proxy
        participant O as openshift-oauth service
        participant X as Central
    end

    B->>C: Request
    C->>P: Forward request
    P->>O: Verify auth token
    O-->>P: Auth result
    P->>X: Forward authorized request
    X-->>P: Data
    P-->>C: Data
    C-->>B: Data

Old development request flow

sequenceDiagram
    participant B as Unauthenticated user browser

    box rgba(0, 200, 100, 0.15) Local Podman
        participant C as openshift-console
    end

    box rgba(0, 128, 255, 0.15) OpenShift cluster
        participant P as sensor-proxy (ignored)
        participant X as Central
    end

    Note over C: Configure API token
    Note over C: Configure Central endpoint
    Note over C: Configure base path

    B->>C: Request with hard-coded API token
    Note over C,P: sensor-proxy is bypassed
    C->>X: Forward request directly
    X->>X: Validate by API token presence
    X-->>C: Return unfiltered data
    C-->>B: Return data

New development request flow

sequenceDiagram
    participant B as Unauthenticated user browser

    box rgba(0, 200, 100, 0.15) Local Podman
        participant C as openshift-console
    end

    box rgba(0, 128, 255, 0.15) OpenShift cluster
        participant P as sensor-proxy
        participant X as Central
    end

    B->>C: Request
    C->>C: Inject OpenShift auth token
    C->>P: Forward request transparently
    P->>P: Authorize using auth token
    P->>X: Forward authorized request
    X-->>P: Data
    P-->>C: Data
    C-->>B: Data

User-facing documentation

CHANGELOG.md is updated OR update is not needed
documentation PR is created and is linked above OR is not needed

Testing and quality

the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
CI results are inspected

Automated testing

added unit tests
added e2e tests
added regression tests
added compatibility tests
modified existing tests

How I validated my change

Verify that ./scripts/start-ocp-console.sh starts the local Console correctly and loads the dev version of the dynamic plugin.

Verify that data visible in the console plugin is correctly scoped to the cluster and namespace it belongs to.

Verify that the resources are created correctly, and are automatically cleaned up after the time limit with:

oc -n stackrox get cronjobs
oc -n stackrox get services
oc -n stackrox get networkpolicy