sub-quadratic attention by Birch-san · Pull Request #1

sub-quadratic attention by Birch-san · Pull Request #1 · Birch-san/diffusers

mentioned this pull request

Dec 27, 2022

…hannels_per_head] in order to make use of batched matmuls. fuse multiply into matmul. breaks bias, mask in exchange for massive speedup.

…ghts_calc_fn, calc_fn_data) and unused vars

…ul for SD 2.1. but remove value float32, having established that it works without.

…to prefer fast-path whenever unchunked attention would fit into memory. add kv_chunk_size_min to control the kv_chunk_size=None behaviour, so that sqrt(key_tokens) does not pick too small of a chunk size

…of chunk key size. improve separation of concerns.

…al kv_chunk_size: they can notice when no chunking would happen at all, and use fast-path. note: there's a question of whether that concern belongs *inside* the algorithm. but it'd feel weird for chunked attention to have a no-chunking-at-all branch.

… equivalent fast-path for 1 query chunk, 1 kv chunk is already supported inside

…ything in one chunk, to re-use an existing fast-path.

Beinsezii added a commit to Beinsezii/diffusers that referenced this pull request

Feb 28, 2024

Beinsezii added a commit to Beinsezii/diffusers that referenced this pull request

Feb 28, 2024