Feature: Add Z-Image-Turbo model support by Pfannkuchensack · Pull Request #8671 · invoke-ai/InvokeAI

@Pfannkuchensack

Add comprehensive support for Z-Image-Turbo (S3-DiT) models including:

Backend:
- New BaseModelType.ZImage in taxonomy
- Z-Image model config classes (ZImageTransformerConfig, Qwen3TextEncoderConfig)
- Model loader for Z-Image transformer and Qwen3 text encoder
- Z-Image conditioning data structures
- Step callback support for Z-Image with FLUX latent RGB factors

Invocations:
- z_image_model_loader: Load Z-Image transformer and Qwen3 encoder
- z_image_text_encoder: Encode prompts using Qwen3 with chat template
- z_image_denoise: Flow matching denoising with time-shifted sigmas
- z_image_image_to_latents: Encode images to 16-channel latents
- z_image_latents_to_image: Decode latents using FLUX VAE

Frontend:
- Z-Image graph builder for text-to-image generation
- Model picker and validation updates for z-image base type
- CFG scale now allows 0 (required for Z-Image-Turbo)
- Clip skip disabled for Z-Image (uses Qwen3, not CLIP)
- Optimal dimension settings for Z-Image (1024x1024)

Technical details:
- Uses Qwen3 text encoder (not CLIP/T5)
- 16 latent channels with FLUX-compatible VAE
- Flow matching scheduler with dynamic time shift
- 8 inference steps recommended for Turbo variant
- bfloat16 inference dtype

@Pfannkuchensack

Add comprehensive LoRA support for Z-Image models including:

Backend:
- New Z-Image LoRA config classes (LoRA_LyCORIS_ZImage_Config, LoRA_Diffusers_ZImage_Config)
- Z-Image LoRA conversion utilities with key mapping for transformer and Qwen3 encoder
- LoRA prefix constants (Z_IMAGE_LORA_TRANSFORMER_PREFIX, Z_IMAGE_LORA_QWEN3_PREFIX)
- LoRA detection logic to distinguish Z-Image from Flux models
- Layer patcher improvements for proper dtype conversion and parameter

@Pfannkuchensack

…ntification

Move Flux layer structure check before metadata check to prevent misidentifying Z-Image LoRAs (which use `diffusion_model.layers.X`) as Flux AI Toolkit format. Flux models use `double_blocks` and `single_blocks` patterns which are now checked first regardless of metadata presence.
…ibility

Add comprehensive support for GGUF quantized Z-Image models and improve component flexibility:

Backend:
- New Main_GGUF_ZImage_Config for GGUF quantized Z-Image transformers
- Z-Image key detection (_has_z_image_keys) to identify S3-DiT models
- GGUF quantization detection and sidecar LoRA patching for quantized models
- Qwen3Encoder_Qwen3Encoder_Config for standalone Qwen3 encoder models

Model Loader:
- Split Z-Image model

@Pfannkuchensack

@Pfannkuchensack

@Pfannkuchensack

@Pfannkuchensack

@Pfannkuchensack

@Pfannkuchensack

…inModelConfig

The FLUX Dev license warning in model pickers used isCheckpointMainModelConfig
incorrectly:
```
isCheckpointMainModelConfig(config) && config.variant === 'dev'
```

This caused a TypeScript error because CheckpointModelConfig type doesn't
include the 'variant' property (it's extracted as `{ type: 'main'; format:
'checkpoint' }` which doesn't narrow to include variant).

Changes:
- Add isFluxDevMainModelConfig type guard that properly checks
  base='flux' AND variant='dev', returning MainModelConfig
- Update MainModelPicker and InitialStateMainModelPicker to use new guard
- Remove isCheckpointMainModelConfig as it had no other usages

The function was removed because:
1. It was only used for detecting FLUX Dev models (incorrect use case)
2. No other code needs a generic "is checkpoint format" check
3. The pattern in this codebase is specific type guards per model variant
   (isFluxFillMainModelModelConfig, isRefinerMainModelModelConfig, etc.)
…ters

- Add Qwen3EncoderGGUFLoader for llama.cpp GGUF quantized text encoders
- Convert llama.cpp key format (blk.X., token_embd) to PyTorch format
- Handle tied embeddings (lm_head.weight ↔ embed_tokens.weight)
- Dequantize embed_tokens for embedding lookups (GGMLTensor limitation)
- Add QK normalization key mappings (q_norm, k_norm) for Qwen3
- Set Z-Image defaults: steps=9, cfg_scale=0.0, width/height=1024
- Allow cfg_scale >= 0 (was >= 1) for Z-Image Turbo compatibility
- Add GGUF format detection for Qwen3 model probing
…rNorm

- Add CustomDiffusersRMSNorm for diffusers.models.normalization.RMSNorm
- Add CustomLayerNorm for torch.nn.LayerNorm
- Register both in AUTOCAST_MODULE_TYPE_MAPPING

Enables partial loading (enable_partial_loading: true) for Z-Image models
by wrapping their normalization layers with device autocast support

@Pfannkuchensack

@Pfannkuchensack

Fixed the DEFAULT_TOKENIZER_SOURCE to Qwen/Qwen3-4B

@Pfannkuchensack

@blessedcoolant

…noise node

The Z-Image denoise node outputs latents, not images, so these mixins
were unnecessary. Metadata and board handling is correctly done in the
L2I (latents-to-image) node. This aligns with how FLUX denoise works.

@Pfannkuchensack

The previous mixed-precision optimization for FP32 mode only converted
some VAE decoder layers (post_quant_conv, conv_in, mid_block) to the
latents dtype while leaving others (up_blocks, conv_norm_out) in float32.
This caused "expected scalar type Half but found Float" errors after
recent diffusers updates.

Simplify FP32 mode to consistently use float32 for both VAE and latents,
removing the incomplete mixed-precision logic. This trades some VRAM
usage for stability and correctness.

Also removes now-unused attention processor imports.

bachp

@blessedcoolant

@blessedcoolant