Support model_parallel for LoRA
Hi, I have out of memory problem when using LoRA for LLama.
So I change the Hparams to
device: cuda model_parallel: true
But I get another CUDA error:
File "/data//miniconda3/envs/chat310new/lib/python3.10/sitepackages/transformers/models/llama/modeling_llama.py", line 91, in forward return self.weight * hidden_states RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
Can you help me to use LoRA with multiple GPUs?