使用 Nunchaku 实现2~4倍高速推理 <7GB 低显存占用 by juntaosun · Pull Request #99

使用 Nunchaku 实现2~4倍高速推理 <7GB 低显存占用 by juntaosun · Pull Request #99 · bytedance/DreamO

我尝试第一次给 DreamO 贡献一份小小的 PR~ 😉

【特色内容】
👉️ 新增 nunchaku 支持，可达 2~4 倍高速推理，低显存 <~7GB 占用，三个参考图。
👉️ 现在，它可在消费级GPU比如 >8GB 显卡上畅玩，祝大家玩的开心！🎉
👉️ 推理，仅需数十秒，即可生成 1024x1024 图像！（基于 NVIDIA RTX 3080 实测）

【主要变化】本次 PR 改动如下，兼容 v1 或最新 v1.1 模型：
（1）dreamo_pipeline.py ：新增兼容 load_dreamo_model_nunchaku
（2）dreamo_generator.py：新增，负责核心加载或量化逻辑处理。
（3）app.py ：用户 webUI 界面代码更整洁，实时推理进度展示。
（4）requirements.txt ，将依赖升级到 diffusers==0.32.2

【显存占用】不同量化对显存的影响，对比数据：

--quant	VRAM	mark
default	24GB	⚠️
int8	16GB	⚠️
nunchaku	6.5GB	✅

app.py 启动参数：

--quant ：default or int8 or nunchaku

【安装说明】Nunchaku 最新版本安装详见：
https://github.com/mit-han-lab/nunchaku

【运行说明】
运行 app.py，打开 webui 页面，使用 nunchaku 快速推理。

parser.add_argument('--quant', type=str, default='nunchaku', help='Quantize to use: default, int8, nunchaku')

【Featured Content】
Added nunchaku support, up to 2~4 times faster inference, low video memory usage <~7GB, three reference images.
Now, it can be played on consumer-grade GPUs such as >8GB graphics cards, I wish you all a happy game! 🎉

【Installation instructions】For the latest version of Nunchaku installation, see:
https://github.com/mit-han-lab/nunchaku

app.py startup parameters:

--quant ：default or int8 or nunchaku

【Running instructions】
Run app.py, open the webui page, and use nunchaku for fast inference.

a person playing guitar in the street，lookat the viewer.