[Docs Update] QLoRA 4-bit Support on ROCm by Abdennacer-Badaoui · Pull Request #1857

[Docs Update] QLoRA 4-bit Support on ROCm by Abdennacer-Badaoui · Pull Request #1857 · bitsandbytes-foundation/bitsandbytes

PR Review: #1857 — [Docs Update] QLoRA 4-bit Support on ROCm

Documentation update reflecting PR #1856 (merged), which added blocksize=64 4-bit quantization support for ROCm CDNA devices. Changes the AMD GPU row in the README feature matrix from “Partially Supported” to “Supported” for QLoRA 4-bit, and simplifies the ROCm installation notes.

No blocking issues.

One suggestion: the old text noted that CDNA devices “default to 128” blocksize. While blocksize=64 is now supported on CDNA after #1856, the code still defaults to 128 on CDNA (see functional.py line 873: blocksize = 64 if not ROCM_WARP_SIZE_64 else 128). Consider adding a brief note such as:

All features are supported for both consumer RDNA devices and Data Center CDNA products. Note: the default 4-bit blocksize on CDNA is 128; pass blocksize=64 explicitly for smaller blocks.

This would prevent users on CDNA from being surprised that their default blocksize differs from RDNA/CUDA. The author’s own comment on this PR asks the same question (“should we make blocksize=64 the default for ROCm?”), which suggests the discrepancy is known but not yet resolved.

Security: Clear — only documentation files modified (README.md, docs/source/installation.mdx), no code, no scripts, no CI changes
Downstream impact: None
Tests: Not applicable (docs-only)
CI: Lint and docs build both pass
Cross-PR conflicts: Minor file overlap with #1853 (MPS backend) and #1695 (NPU backend) on README.md, but changes are in different table rows — no conflict expected