[MAX] Align FLUX.2-dev prompt masking with diffusers by pei0033 · Pull Request #6331

[MAX] Align FLUX.2-dev prompt masking with diffusers by pei0033 · Pull Request #6331 · modular/modular

Summary

This PR extends the masking approach from PR #6153 to the FLUX.2-dev/Mistral3 text-encoder path.

Instead of compacting padded tokens in PixelGenerationTokenizer, we now preserve the padded prompt sequence and pass the tokenizer-generated attention_mask through the FLUX.2 pipeline into the Mistral3 text encoder. The text encoder materializes an additive attention bias and uses masked flash attention, matching the same overall masking model already used for FLUX.2-Klein.

What Changed

removed the FLUX.2-specific token compaction workaround from PixelGenerationTokenizer
threaded attention_mask through flux2 and flux2_modulev3 prompt-embedding paths
added a shared attention_bias_from_attention_mask_array(...) helper
updated graph and modulev3 Mistral3 text encoders to consume additive attention bias
switched Mistral3 text attention to masked flash attention
reused the shared helper in the existing Qwen3 text-encoder path for consistency

Why

Before this change, FLUX.2-dev still relied on a tokenizer-side workaround, while FLUX.2-Klein already had an explicit masking path. This PR removes that inconsistency and makes FLUX.2 prompt masking behavior easier to reason about and maintain across model variants.

Testing

./bazelw run //max/tests/integration/accuracy:verify_pipelines --   --pipeline black-forest-labs/FLUX.2-dev-t2i-bfloat16-v2    --devices gpu:0   --pixel-results-dir ../results

Checklist

PR is small and focused — consider splitting larger changes into a
sequence of smaller PRs
I ran ./bazelw run format to format my changes
I added or updated tests to cover my changes
If AI tools assisted with this contribution, I have included an
Assisted-by: trailer in my commit message or this PR description
(see AI Tool Use Policy)
Assisted-by: codex