[MAX] Align FLUX.2-dev prompt masking with diffusers by pei0033 · Pull Request #6331 · modular/modular
Summary
This PR extends the masking approach from PR #6153 to the FLUX.2-dev/Mistral3 text-encoder path.
Instead of compacting padded tokens in PixelGenerationTokenizer, we now preserve the padded prompt sequence and pass the tokenizer-generated attention_mask through the FLUX.2 pipeline into the Mistral3 text encoder. The text encoder materializes an additive attention bias and uses masked flash attention, matching the same overall masking model already used for FLUX.2-Klein.
What Changed
- removed the FLUX.2-specific token compaction workaround from
PixelGenerationTokenizer - threaded
attention_maskthroughflux2andflux2_modulev3prompt-embedding paths - added a shared
attention_bias_from_attention_mask_array(...)helper - updated graph and modulev3 Mistral3 text encoders to consume additive attention bias
- switched Mistral3 text attention to masked flash attention
- reused the shared helper in the existing Qwen3 text-encoder path for consistency
Why
Before this change, FLUX.2-dev still relied on a tokenizer-side workaround, while FLUX.2-Klein already had an explicit masking path. This PR removes that inconsistency and makes FLUX.2 prompt masking behavior easier to reason about and maintain across model variants.
Testing
./bazelw run //max/tests/integration/accuracy:verify_pipelines -- --pipeline black-forest-labs/FLUX.2-dev-t2i-bfloat16-v2 --devices gpu:0 --pixel-results-dir ../results
Checklist
- PR is small and focused — consider splitting larger changes into a
sequence of smaller PRs - I ran
./bazelw run formatto format my changes - I added or updated tests to cover my changes
- If AI tools assisted with this contribution, I have included an
Assisted-by:trailer in my commit message or this PR description
(see AI Tool Use Policy)
Assisted-by: codex