[XPU] Add inference benchmark for XPU by Egor-Krivov · Pull Request #1696 · bitsandbytes-foundation/bitsandbytes

Benchmark prints latency of token generation on XPU. It is useful for choosing between backends on XPU.

python benchmarking/inference_benchmark.py  --device xpu --nf4
Benchmarking batch size: 1
Traceback (most recent call last):
  File "/home/jovyan/triton/bitsandbytes/benchmarking/inference_benchmark.py", line 122, in <module>
    backend_config = PyTorchConfig(
  File "<string>", line 37, in __init__
  File "/home/jovyan/triton/intel-xpu-backend-for-triton/.venv/lib/python3.10/site-packages/optimum_benchmark/backends/pytorch/config.py", line 55, in __post_init__
    super().__post_init__()
  File "/home/jovyan/triton/intel-xpu-backend-for-triton/.venv/lib/python3.10/site-packages/optimum_benchmark/backends/config.py", line 102, in __post_init__
    raise ValueError(f"`device` must be either `cuda`, `cpu`, `mps`, `xla` or `gpu`, but got {self.device}")
ValueError: `device` must be either `cuda`, `cpu`, `mps`, `xla` or `gpu`, but got xpu