Fix(model manager): Improve calculation of Z-Image VAE working memory needs by lstein · Pull Request #8740 · invoke-ai/InvokeAI
Summary
This is a speculative PR that may address the OOM issues experienced on low-VRAM machines running ZImage models during the latent decode phase. It dyamically estimates the amount of VRAM needed by the currently selected VAE and increases the working_mem_bytes passed to the vae.model_on_device() context manager.
Related Issues / Discussions
See the Discord discussion that begins at https://discord.com/channels/1020123559063990373/1149506274971631688/1456756274858430611
QA Instructions
Either use a card with a low amount of VRAM (8-12 GB) or run an external process that uses a fair bit of VRAM in order to reduce the available VRAM to 8-12 GB. Enable use_partial_loading, but disable max_cache_vram and device_working_mem_gb.
- Without this PR, run a generation with a combination of ZImage model and image dimensions that gives an OOM during the VAE decode phase.
- Restart the server after pulling this PR and try the same generation.
- Does it complete generation without OOM?
You may need to play with the image size to see an improvement.
Merge Plan
Simple merge
Checklist
- The PR has a short but descriptive title, suitable for a changelog
- Tests added / updated (if applicable)
- ❗Changes to a redux slice have a corresponding migration
- Documentation added / updated (if applicable)
- Updated
What's Newcopy (if doing a release after this PR)