Fix(model manager): Improve calculation of Z-Image VAE working memory needs by lstein · Pull Request #8740 · invoke-ai/InvokeAI

Summary

This is a speculative PR that may address the OOM issues experienced on low-VRAM machines running ZImage models during the latent decode phase. It dyamically estimates the amount of VRAM needed by the currently selected VAE and increases the working_mem_bytes passed to the vae.model_on_device() context manager.

Related Issues / Discussions

See the Discord discussion that begins at https://discord.com/channels/1020123559063990373/1149506274971631688/1456756274858430611

QA Instructions

Either use a card with a low amount of VRAM (8-12 GB) or run an external process that uses a fair bit of VRAM in order to reduce the available VRAM to 8-12 GB. Enable use_partial_loading, but disable max_cache_vram and device_working_mem_gb.

  1. Without this PR, run a generation with a combination of ZImage model and image dimensions that gives an OOM during the VAE decode phase.
  2. Restart the server after pulling this PR and try the same generation.
  3. Does it complete generation without OOM?

You may need to play with the image size to see an improvement.

Merge Plan

Simple merge

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • ❗Changes to a redux slice have a corresponding migration
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)