hust-wx - Overview

SageAttention-Win-Ada8.9 SageAttention-Win-Ada8.9 Public

Forked from thu-ml/SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 1