lfopensource - Overview
SageAttention SageAttention Public
Forked from thu-ml/SageAttention
Quantized Attention achieves speedup of 2-3x and 3-5x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
Cuda