[MAX] Align FLUX.2-dev prompt masking with diffusers by pei0033 · Pull Request #6331 · modular/modular

Summary

This PR extends the masking approach from PR #6153 to the FLUX.2-dev/Mistral3 text-encoder path.

Instead of compacting padded tokens in PixelGenerationTokenizer, we now preserve the padded prompt sequence and pass the tokenizer-generated attention_mask through the FLUX.2 pipeline into the Mistral3 text encoder. The text encoder materializes an additive attention bias and uses masked flash attention, matching the same overall masking model already used for FLUX.2-Klein.

What Changed

  • removed the FLUX.2-specific token compaction workaround from PixelGenerationTokenizer
  • threaded attention_mask through flux2 and flux2_modulev3 prompt-embedding paths
  • added a shared attention_bias_from_attention_mask_array(...) helper
  • updated graph and modulev3 Mistral3 text encoders to consume additive attention bias
  • switched Mistral3 text attention to masked flash attention
  • reused the shared helper in the existing Qwen3 text-encoder path for consistency

Why

Before this change, FLUX.2-dev still relied on a tokenizer-side workaround, while FLUX.2-Klein already had an explicit masking path. This PR removes that inconsistency and makes FLUX.2 prompt masking behavior easier to reason about and maintain across model variants.

Testing

./bazelw run //max/tests/integration/accuracy:verify_pipelines --   --pipeline black-forest-labs/FLUX.2-dev-t2i-bfloat16-v2    --devices gpu:0   --pixel-results-dir ../results

Checklist

  • PR is small and focused — consider splitting larger changes into a
    sequence of smaller PRs
  • I ran ./bazelw run format to format my changes
  • I added or updated tests to cover my changes
  • If AI tools assisted with this contribution, I have included an
    Assisted-by: trailer in my commit message or this PR description
    (see AI Tool Use Policy)
    Assisted-by: codex