OminiControl: Minimal and Universal Control for Diffusion Transformer
Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, and Xinchao Wang
Learning and Vision Lab, National University of Singapore
Features
OminiControl is a minimal yet powerful universal control framework for Diffusion Transformer models like FLUX.
-
Universal Control 🌐: A unified control framework that supports both subject-driven control and spatial control (such as edge-guided and in-painting generation).
-
Minimal Design 🚀: Injects control signals while preserving original model structure. Only introduces 0.1% additional parameters to the base model.
OminiControlGP (OminiControl for the GPU Poor) by DeepBeepMeep
With just one line adding the 'mmgp' module (https://github.com/deepbeepmeep/mmgp\), OminiControl can generate images from a derived Flux model in less than 6s with 16 GB of VRAM (profile 1), in 9s with 8 GB VRAM (profile 4) or in 16s with less than 6 GB of VRAM (profile 5)
To run the Gradio app with a profile 3 (default profile, the fastest but requires the most VRAM):
python gradio_app --profile 3
To run the Gradio app with a profile 5 (a bit slower but requires only 6 GB of VRAM):
python gradio_app --profile 5
You may check the mmgp git homepage if you want to design your own profiles (for instance to disable quantization).
If you enjoy this applcitation, you will certainly appreciate these ones:\
-
Hunyuan3D-2GP: https://github.com/deepbeepmeep/Hunyuan3D-2GP\ A great image to 3D or text to 3D tool by the Tencent team. Thanks to mmgp it can run with less than 6 GB of VRAM
-
HuanyuanVideoGP: https://github.com/deepbeepmeep/HunyuanVideoGP\ One of the best open source Text to Video generator
-
FluxFillGP: https://github.com/deepbeepmeep/FluxFillGP\ One of the best inpainting / outpainting tools based on Flux that can run with less than 12 GB of VRAM.
-
Cosmos1GP: https://github.com/deepbeepmeep/Cosmos1GP\ This application include two models: a text to world generator and a image / video to world (probably the best open source image to video generator).
News
- 2025-01-25: ⭐️ DeepBeepMeep fork: added support for mmgp
- 2024-12-26: ⭐️ Training code are released. Now you can create your own OminiControl model by customizing any control tasks (3D, multi-view, pose-guided, try-on, etc.) with the FLUX model. Check the training folder for more details.
Quick Start
Setup (Optional)
- Environment setup
conda create -n omini python=3.10 conda activate omini
- Requirements installation
pip install -r requirements.txt
Usage example
- Subject-driven generation:
examples/subject.ipynb - In-painting:
examples/inpainting.ipynb - Canny edge to image, depth to image, colorization, deblurring:
examples/spatial.ipynb
Guidelines for subject-driven generation
- Input images are automatically center-cropped and resized to 512x512 resolution.
- When writing prompts, refer to the subject using phrases like
this item,the object, orit. e.g.- A close up view of this item. It is placed on a wooden table.
- A young lady is wearing this shirt.
- The model primarily works with objects rather than human subjects currently, due to the absence of human data in training.
Generated samples
Subject-driven generation
Demos (Left: condition image; Right: generated image)
Text Prompts
- Prompt1: A close up view of this item. It is placed on a wooden table. The background is a dark room, the TV is on, and the screen is showing a cooking show. With text on the screen that reads 'Omini Control!.'
- Prompt2: A film style shot. On the moon, this item drives across the moon surface. A flag on it reads 'Omini'. The background is that Earth looms large in the foreground.
- Prompt3: In a Bauhaus style room, this item is placed on a shiny glass table, with a vase of flowers next to it. In the afternoon sun, the shadows of the blinds are cast on the wall.
- Prompt4: "On the beach, a lady sits under a beach umbrella with 'Omini' written on it. She's wearing this shirt and has a big smile on her face, with her surfboard hehind her. The sun is setting in the background. The sky is a beautiful shade of orange and purple."
More results
Spatially aligned control
- Image Inpainting (Left: original image; Center: masked image; Right: filled image)
-
Other spatially aligned tasks (Canny edge to image, depth to image, colorization, deblurring)
Models
Subject-driven control:
| Model | Base model | Description | Resolution |
|---|---|---|---|
experimental / subject |
FLUX.1-schnell | The model used in the paper. | (512, 512) |
omini / subject_512 |
FLUX.1-schnell | The model has been fine-tuned on a larger dataset. | (512, 512) |
omini / subject_1024 |
FLUX.1-schnell | The model has been fine-tuned on a larger dataset and accommodates higher resolution. (To be released) | (1024, 1024) |
oye-cartoon |
FLUX.1-dev | The model has been fine-tuned on oye-cartoon dataset by @saquib764 | (512, 512) |
Spatial aligned control:
| Model | Base model | Description | Resolution |
|---|---|---|---|
experimental / <task_name> |
FLUX.1 | Canny edge to image, depth to image, colorization, deblurring, in-painting | (512, 512) |
experimental / <task_name>_1024 |
FLUX.1 | Supports higher resolution.(To be released) | (1024, 1024) |
Community Extensions
- ComfyUI-Diffusers-OminiControl - ComfyUI integration by @Macoron
- ComfyUI_RH_OminiControl - ComfyUI integration by @HM-RunningHub
Limitations
- The model's subject-driven generation primarily works with objects rather than human subjects due to the absence of human data in training.
- The subject-driven generation model may not work well with
FLUX.1-dev. - The released model currently only supports the resolution of 512x512.
Training
Training instructions can be found in this folder.
To-do
- Release the training code.
- Release the model for higher resolution (1024x1024).
Citation
@article{
tan2024omini,
title={OminiControl: Minimal and Universal Control for Diffusion Transformer},
author={Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, and Xinchao Wang},
journal={arXiv preprint arXiv:2411.15098},
year={2024}
}