[XPU] Add inference benchmark for XPU by Egor-Krivov · Pull Request #1696 · bitsandbytes-foundation/bitsandbytes
Benchmark prints latency of token generation on XPU. It is useful for choosing between backends on XPU.
python benchmarking/inference_benchmark.py --device xpu --nf4
Benchmarking batch size: 1
Traceback (most recent call last):
File "/home/jovyan/triton/bitsandbytes/benchmarking/inference_benchmark.py", line 122, in <module>
backend_config = PyTorchConfig(
File "<string>", line 37, in __init__
File "/home/jovyan/triton/intel-xpu-backend-for-triton/.venv/lib/python3.10/site-packages/optimum_benchmark/backends/pytorch/config.py", line 55, in __post_init__
super().__post_init__()
File "/home/jovyan/triton/intel-xpu-backend-for-triton/.venv/lib/python3.10/site-packages/optimum_benchmark/backends/config.py", line 102, in __post_init__
raise ValueError(f"`device` must be either `cuda`, `cpu`, `mps`, `xla` or `gpu`, but got {self.device}")
ValueError: `device` must be either `cuda`, `cpu`, `mps`, `xla` or `gpu`, but got xpu