[MAX] Add file-backed media responses for OpenResponses pixel generation by pei0033 · Pull Request #6341

[MAX] Add file-backed media responses for OpenResponses pixel generation by pei0033 · Pull Request #6341 · modular/modular

Summary

This PR improves OpenResponses-based pixel-generation serving by adding file-backed media responses and response-format selection for generated outputs.

Specifically, it:

adds file-backed image/video response handling for /v1/responses
supports response_format: "url" and response_format: "b64_json" for generated media
validates that the requested model matches the currently served model and returns 404 on mismatch
adds video-output handling in the shared pixel-generation pipeline so video responses can be serialized correctly

Example local flow validated during development:

Start a server:

MAX_SERVE_API_TYPES='["responses"]' \
./bazelw run //max/python/max/entrypoints:pipelines -- serve \
  --model-path black-forest-labs/FLUX.2-klein-4B \
  --task pixel_generation \
  --port 8000 \
  --devices gpu \
  --prefer-module-v3

Send a T2I request:

cat >/tmp/flux_t2i_request.json <<'EOF'
{
  "model": "black-forest-labs/FLUX.2-klein-4B",
  "input": "A studio portrait of a tabby cat with dramatic lighting.",
  "seed": 42,
  "provider_options": {
    "image": {
      "guidance_scale": 4.0,
      "output_format": "png",
      "response_format": "url",
      "width": 512,
      "height": 512,
      "steps": 4
    }
  }
}
EOF

curl -sS http://127.0.0.1:8000/v1/responses \
  -H 'Content-Type: application/json' \
  --data @/tmp/flux_t2i_request.json \
  > /tmp/flux_t2i_response.json

With response_format: "url", the response includes an image_url that can
be fetched from /v1/images/{image_id}/content. With
response_format: "b64_json", the response includes inline image_data.

Testing

./bazelw test //max/tests/tests/serve:test_openresponses_routes

//max/tests/tests/serve:test_openresponses_routes specifically verifies that:

basic /v1/responses requests still succeed
requests are rejected when the requested model does not match the served model
video responses can be returned as downloadable file-backed URLs
video responses can be returned as inline base64 payloads when response_format: "b64_json" is requested

I also manually verified:

T2I with black-forest-labs/FLUX.2-klein-4B
response_format: "url" returns image_url
response_format: "b64_json" returns inline image_data
requesting a different model name than the served model returns 404

Checklist

PR is small and focused — consider splitting larger changes into a
sequence of smaller PRs
I ran ./bazelw run format to format my changes
I added or updated tests to cover my changes
If AI tools assisted with this contribution, I have included an
Assisted-by: trailer in my commit message or this PR description
(see AI Tool Use Policy)

Assisted-by: OpenAI Codex