Feat[model support]: Qwen Image Edit 2511 — full pipeline with LoRA, GGUF, quantization, and UI by lstein · Pull Request #9000 · invoke-ai/InvokeAI
and others added 4 commits
March 26, 2026 21:52Adds full support for the Qwen Image Edit 2511 model architecture, including both the diffusers version (Qwen/Qwen-Image-Edit-2511) and GGUF quantized versions (unsloth/Qwen-Image-Edit-2511-GGUF). Backend changes: - Add QwenImageEdit base model type to taxonomy - Add diffusers and GGUF model config classes with detection logic - Add model loader for diffusers and GGUF formats - Add 5 invocation nodes: model loader, text/vision encoder, denoise, image-to-latents, latents-to-image - Add QwenVLEncoderField for Qwen2.5-VL vision-language encoder - Add QwenImageEditConditioningInfo and conditioning field - Add generation modes and step callback support - Add 5 starter models (full diffusers + Q2_K, Q4_K_M, Q6_K, Q8_0 GGUF) Frontend changes: - Add graph builder for linear UI generation - Register in canvas and generate enqueue hooks - Update type definitions, optimal dimensions, grid sizes - Add readiness validation, model picker grouping, clip skip config - Regenerate OpenAPI schema Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> fix: use AutoProcessor.from_pretrained to load Qwen VL processor correctly Co-authored-by: lstein <111189+lstein@users.noreply.github.com> Agent-Logs-Url: https://github.com/lstein/InvokeAI/sessions/4d4417be-0f61-4faa-a21c-16e9ce81fec7 chore: bump diffusers==0.37.1 Co-authored-by: lstein <111189+lstein@users.noreply.github.com> Agent-Logs-Url: https://github.com/lstein/InvokeAI/sessions/38a76809-d9a3-40f1-b5b3-fb56342e8e90 fix: handle multiple reference images feature: add text encoder selection to advanced section for Qwen Image Edit feat: complete Qwen Image Edit pipeline with LoRA, GGUF, quantization, and UI support Major additions: - LoRA support: loader invocation, config detection, conversion utils, prefix constants, and LayerPatcher integration in denoise with sidecar patching for GGUF models - Lightning LoRA: starter models (4-step and 8-step bf16), shift override parameter for the distilled sigma schedule - GGUF fixes: correct base class (ModelLoader), zero_cond_t=True, correct in_channels (no /4 division) - Denoise: use FlowMatchEulerDiscreteScheduler directly, proper CFG gating (skip negative when cfg<=1), reference latent pixel-space resize - I2L: resize reference image to generation dimensions before VAE encoding - Graph builder: wire LoRAs via collection loader, VAE-encode reference image as latents for spatial conditioning, pass shift/quantization params - Frontend: shift override (checkbox+slider), LoRA graph wiring, scheduler hidden for Qwen Image Edit, model switching cleanup - Starter model bundle for Qwen Image Edit - LoRA config registered in discriminated union (factory.py) - Downgrade transformers requirement back to >=4.56.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- GGUF loader: handle zero_cond_t absence in diffusers 0.36, try dtype before torch_dtype for forward compat - Denoise: load scheduler config from disk with GGUF fallback, inline calculate_shift to avoid pipeline import, remove deprecated txt_seq_lens - Text encoder: resize reference images to ~512x512 before VL encoding to prevent vision tokens from overwhelming the text prompt - Picker badges: wrap to next line instead of truncating labels Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove module-level cache for quantized encoders — load fresh each invocation and free VRAM via cleanup callback (gc + empty_cache) - Suppress harmless BnB MatMul8bitLt bfloat16→float16 cast warning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
lstein
changed the title
feat: Qwen Image Edit 2511 — full pipeline with LoRA, GGUF, quantization, and UI
Feat[model support]: Qwen Image Edit 2511 — full pipeline with LoRA, GGUF, quantization, and UI
Rename the base model type from "qwen-image-edit" to "qwen-image" to
reflect that the Qwen Image family includes both txt2img and image
editing models. The edit models are a specific use case within the
broader Qwen Image architecture.
- BaseModelType.QwenImageEdit -> BaseModelType.QwenImage ("qwen-image")
- All Python files, classes, variables, and invocation names renamed
- All TypeScript/React components, selectors, and state fields renamed
- Frontend display: "Qwen Image" in model picker, "QwenImg" badge
- Starter model bundle: "Qwen Image"
- File renames: qwen_image_edit_* -> qwen_image_*
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- QwenImageVariantType enum: Generate (txt2img) and Edit (image editing) - Diffusers models: auto-detect variant from model_index.json pipeline class (QwenImagePipeline → Generate, QwenImageEditPlusPipeline → Edit) - GGUF models: default to Generate (can't detect from state dict) - Frontend: hide reference image panel when a Generate variant is selected - Variant display names: "Qwen Image" / "Qwen Image Edit" - ModelRecordChanges: include QwenImageVariantType in variant union Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The variant field with a default value was appended to the discriminator tag (e.g. main.gguf_quantized.qwen-image.generate), breaking model detection for GGUF and Diffusers models. Making variant optional with default=None restores the correct tags (main.gguf_quantized.qwen-image). The variant is still set during Diffusers model probing via _get_qwen_image_variant() and can be manually set for GGUF models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevents variable name collisions when the txt2img branch adds qwen_image_* variables for the Qwen Image 2512 models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
lstein
marked this pull request as draft
- Add optional variant field to StarterModelWithoutDependencies - Tag all Qwen Image Edit starter models (Diffusers + GGUF) with variant=QwenImageVariantType.Edit - Frontend passes variant through to the install endpoint config so GGUF edit models get the correct variant set on install Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The txt2img model doesn't use zero_cond_t — setting it causes the transformer to double the timestep batch and create modulation indices for non-existent reference patches, producing noise output. Now checks the config variant before enabling it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Flux PEFT LoRAs use transformer.single_transformer_blocks.* keys which contain "transformer_blocks." as a substring, falsely matching the Qwen Image LoRA detection. Add single_transformer_blocks to the Flux exclusion set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
lstein
marked this pull request as ready for review
Previously the graph builder passed the output canvas dimensions to the I2L node, which resized the reference image to match — distorting its aspect ratio when they differed. Now the reference is encoded at its native size. The denoise node already handles dimension mismatches via bilinear interpolation in latent space. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… without component source Addresses two reviewer findings: 1. denoising_start/denoising_end were ignored — the full sigma schedule was always used regardless of img2img strength. Now clip the scheduler's sigmas to the fractional range before stepping, and use manual Euler steps with the clipped schedule (scheduler.step() can't handle clipped schedules due to internal index tracking). 2. GGUF Qwen Image models could be enqueued without a Component Source, deferring the error to runtime. Added readiness checks on both the Generate and Canvas tabs that block enqueue when a GGUF model is selected but no Diffusers component source is configured. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
lstein
marked this pull request as draft
Denoise tests (13): - _prepare_cfg_scale: scalar, list, mismatch, invalid type - _compute_sigmas: default schedule, shift override, terminal stretch, monotonicity, step counts, image_seq_len affecting mu - _pack/_unpack_latents: roundtrip, shape verification Model loader tests (5): - Diffusers model extracts all components from itself - Diffusers model ignores component_source when provided - GGUF with Diffusers component source succeeds - GGUF without component source raises ValueError - GGUF with non-Diffusers source raises ValueError Text encoder tests (13): - _build_prompt: 0/1/many images, template structure, special chars - _resize_for_vl_encoder: large/small images, aspect ratio preservation, dimension rounding, square/portrait/landscape orientations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters