[MAX] Add file-backed media responses for OpenResponses pixel generation by pei0033 · Pull Request #6341 · modular/modular

Summary

This PR improves OpenResponses-based pixel-generation serving by adding file-backed media responses and response-format selection for generated outputs.

Specifically, it:

  • adds file-backed image/video response handling for /v1/responses
  • supports response_format: "url" and response_format: "b64_json" for generated media
  • validates that the requested model matches the currently served model and returns 404 on mismatch
  • adds video-output handling in the shared pixel-generation pipeline so video responses can be serialized correctly

Example local flow validated during development:

  1. Start a server:

    MAX_SERVE_API_TYPES='["responses"]' \
    ./bazelw run //max/python/max/entrypoints:pipelines -- serve \
      --model-path black-forest-labs/FLUX.2-klein-4B \
      --task pixel_generation \
      --port 8000 \
      --devices gpu \
      --prefer-module-v3
  2. Send a T2I request:

    cat >/tmp/flux_t2i_request.json <<'EOF'
    {
      "model": "black-forest-labs/FLUX.2-klein-4B",
      "input": "A studio portrait of a tabby cat with dramatic lighting.",
      "seed": 42,
      "provider_options": {
        "image": {
          "guidance_scale": 4.0,
          "output_format": "png",
          "response_format": "url",
          "width": 512,
          "height": 512,
          "steps": 4
        }
      }
    }
    EOF
    
    curl -sS http://127.0.0.1:8000/v1/responses \
      -H 'Content-Type: application/json' \
      --data @/tmp/flux_t2i_request.json \
      > /tmp/flux_t2i_response.json

With response_format: "url", the response includes an image_url that can
be fetched from /v1/images/{image_id}/content. With
response_format: "b64_json", the response includes inline image_data.

Testing

  • ./bazelw test //max/tests/tests/serve:test_openresponses_routes

//max/tests/tests/serve:test_openresponses_routes specifically verifies that:

  • basic /v1/responses requests still succeed
  • requests are rejected when the requested model does not match the served model
  • video responses can be returned as downloadable file-backed URLs
  • video responses can be returned as inline base64 payloads when response_format: "b64_json" is requested

I also manually verified:

  • T2I with black-forest-labs/FLUX.2-klein-4B
  • response_format: "url" returns image_url
  • response_format: "b64_json" returns inline image_data
  • requesting a different model name than the served model returns 404

Checklist

  • PR is small and focused — consider splitting larger changes into a
    sequence of smaller PRs
  • I ran ./bazelw run format to format my changes
  • I added or updated tests to cover my changes
  • If AI tools assisted with this contribution, I have included an
    Assisted-by: trailer in my commit message or this PR description
    (see AI Tool Use Policy)

Assisted-by: OpenAI Codex