hust-wx - Overview
SageAttention-Win-Ada8.9 SageAttention-Win-Ada8.9 Public
Forked from thu-ml/SageAttention
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
Cuda 1