Feat[model support]: Qwen Image Edit 2511 — full pipeline with LoRA, GGUF, quantization, and UI by lstein · Pull Request #9000

Feat[model support]: Qwen Image Edit 2511 — full pipeline with LoRA, GGUF, quantization, and UI by lstein · Pull Request #9000 · invoke-ai/InvokeAI

and others added 4 commits

March 26, 2026 21:52

Adds full support for the Qwen Image Edit 2511 model architecture,
including both the diffusers version (Qwen/Qwen-Image-Edit-2511) and
GGUF quantized versions (unsloth/Qwen-Image-Edit-2511-GGUF).

Backend changes:
- Add QwenImageEdit base model type to taxonomy
- Add diffusers and GGUF model config classes with detection logic
- Add model loader for diffusers and GGUF formats
- Add 5 invocation nodes: model loader, text/vision encoder, denoise,
  image-to-latents, latents-to-image
- Add QwenVLEncoderField for Qwen2.5-VL vision-language encoder
- Add QwenImageEditConditioningInfo and conditioning field
- Add generation modes and step callback support
- Add 5 starter models (full diffusers + Q2_K, Q4_K_M, Q6_K, Q8_0 GGUF)

Frontend changes:
- Add graph builder for linear UI generation
- Register in canvas and generate enqueue hooks
- Update type definitions, optimal dimensions, grid sizes
- Add readiness validation, model picker grouping, clip skip config
- Regenerate OpenAPI schema

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: use AutoProcessor.from_pretrained to load Qwen VL processor correctly

Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
Agent-Logs-Url: https://github.com/lstein/InvokeAI/sessions/4d4417be-0f61-4faa-a21c-16e9ce81fec7

chore: bump diffusers==0.37.1

Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
Agent-Logs-Url: https://github.com/lstein/InvokeAI/sessions/38a76809-d9a3-40f1-b5b3-fb56342e8e90

fix: handle multiple reference images

feature: add text encoder selection to advanced section for Qwen Image Edit

feat: complete Qwen Image Edit pipeline with LoRA, GGUF, quantization, and UI support

Major additions:
- LoRA support: loader invocation, config detection, conversion utils, prefix
  constants, and LayerPatcher integration in denoise with sidecar patching for
  GGUF models
- Lightning LoRA: starter models (4-step and 8-step bf16), shift override
  parameter for the distilled sigma schedule
- GGUF fixes: correct base class (ModelLoader), zero_cond_t=True, correct
  in_channels (no /4 division)
- Denoise: use FlowMatchEulerDiscreteScheduler directly, proper CFG gating
  (skip negative when cfg<=1), reference latent pixel-space resize
- I2L: resize reference image to generation dimensions before VAE encoding
- Graph builder: wire LoRAs via collection loader, VAE-encode reference image
  as latents for spatial conditioning, pass shift/quantization params
- Frontend: shift override (checkbox+slider), LoRA graph wiring, scheduler
  hidden for Qwen Image Edit, model switching cleanup
- Starter model bundle for Qwen Image Edit
- LoRA config registered in discriminated union (factory.py)
- Downgrade transformers requirement back to >=4.56.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- GGUF loader: handle zero_cond_t absence in diffusers 0.36, try dtype
  before torch_dtype for forward compat
- Denoise: load scheduler config from disk with GGUF fallback, inline
  calculate_shift to avoid pipeline import, remove deprecated txt_seq_lens
- Text encoder: resize reference images to ~512x512 before VL encoding
  to prevent vision tokens from overwhelming the text prompt
- Picker badges: wrap to next line instead of truncating labels

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Remove module-level cache for quantized encoders — load fresh each
  invocation and free VRAM via cleanup callback (gc + empty_cache)
- Suppress harmless BnB MatMul8bitLt bfloat16→float16 cast warning

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

lstein changed the title ~~feat: Qwen Image Edit 2511 — full pipeline with LoRA, GGUF, quantization, and UI~~ Feat[model support]: Qwen Image Edit 2511 — full pipeline with LoRA, GGUF, quantization, and UI

Mar 27, 2026

Rename the base model type from "qwen-image-edit" to "qwen-image" to
reflect that the Qwen Image family includes both txt2img and image
editing models. The edit models are a specific use case within the
broader Qwen Image architecture.

- BaseModelType.QwenImageEdit -> BaseModelType.QwenImage ("qwen-image")
- All Python files, classes, variables, and invocation names renamed
- All TypeScript/React components, selectors, and state fields renamed
- Frontend display: "Qwen Image" in model picker, "QwenImg" badge
- Starter model bundle: "Qwen Image"
- File renames: qwen_image_edit_* -> qwen_image_*

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- QwenImageVariantType enum: Generate (txt2img) and Edit (image editing)
- Diffusers models: auto-detect variant from model_index.json pipeline class
  (QwenImagePipeline → Generate, QwenImageEditPlusPipeline → Edit)
- GGUF models: default to Generate (can't detect from state dict)
- Frontend: hide reference image panel when a Generate variant is selected
- Variant display names: "Qwen Image" / "Qwen Image Edit"
- ModelRecordChanges: include QwenImageVariantType in variant union

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The variant field with a default value was appended to the discriminator
tag (e.g. main.gguf_quantized.qwen-image.generate), breaking model
detection for GGUF and Diffusers models. Making variant optional with
default=None restores the correct tags (main.gguf_quantized.qwen-image).

The variant is still set during Diffusers model probing via
_get_qwen_image_variant() and can be manually set for GGUF models.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Prevents variable name collisions when the txt2img branch adds
qwen_image_* variables for the Qwen Image 2512 models.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…nModelConfig)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…URLs

The global rename sed changed 'qwen-image-edit-2511' to 'qwen-image-2511'
inside the HuggingFace URLs, but the actual files on HF still have 'edit'
in their names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

lstein marked this pull request as draft

March 28, 2026 03:53

- Add optional variant field to StarterModelWithoutDependencies
- Tag all Qwen Image Edit starter models (Diffusers + GGUF) with
  variant=QwenImageVariantType.Edit
- Frontend passes variant through to the install endpoint config so
  GGUF edit models get the correct variant set on install

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The txt2img model doesn't use zero_cond_t — setting it causes the
transformer to double the timestep batch and create modulation indices
for non-existent reference patches, producing noise output. Now checks
the config variant before enabling it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Flux PEFT LoRAs use transformer.single_transformer_blocks.* keys which
contain "transformer_blocks." as a substring, falsely matching the
Qwen Image LoRA detection. Add single_transformer_blocks to the Flux
exclusion set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add stripped test models for:
- Qwen Image Lightning LoRA (transformer_blocks.* keys)
- Qwen Image community LoRA (transformer.transformer_blocks.* keys)

Both should be detected as base=qwen-image, type=lora, format=lycoris.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

lstein marked this pull request as ready for review

March 28, 2026 05:33

Previously the graph builder passed the output canvas dimensions to the
I2L node, which resized the reference image to match — distorting its
aspect ratio when they differed. Now the reference is encoded at its
native size. The denoise node already handles dimension mismatches via
bilinear interpolation in latent space.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… without component source

Addresses two reviewer findings:

1. denoising_start/denoising_end were ignored — the full sigma schedule
   was always used regardless of img2img strength. Now clip the scheduler's
   sigmas to the fractional range before stepping, and use manual Euler
   steps with the clipped schedule (scheduler.step() can't handle clipped
   schedules due to internal index tracking).

2. GGUF Qwen Image models could be enqueued without a Component Source,
   deferring the error to runtime. Added readiness checks on both the
   Generate and Canvas tabs that block enqueue when a GGUF model is
   selected but no Diffusers component source is configured.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

lstein marked this pull request as draft

March 29, 2026 02:48

Denoise tests (13):
- _prepare_cfg_scale: scalar, list, mismatch, invalid type
- _compute_sigmas: default schedule, shift override, terminal stretch,
  monotonicity, step counts, image_seq_len affecting mu
- _pack/_unpack_latents: roundtrip, shape verification

Model loader tests (5):
- Diffusers model extracts all components from itself
- Diffusers model ignores component_source when provided
- GGUF with Diffusers component source succeeds
- GGUF without component source raises ValueError
- GGUF with non-Diffusers source raises ValueError

Text encoder tests (13):
- _build_prompt: 0/1/many images, template structure, special chars
- _resize_for_vl_encoder: large/small images, aspect ratio preservation,
  dimension rounding, square/portrait/landscape orientations

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>