sub-quadratic attention by Birch-san · Pull Request #1 · Birch-san/diffusers

@brkirch mentioned this pull request

Dec 27, 2022

@Birch-san

@Birch-san

@Birch-san

@Birch-san

…hannels_per_head] in order to make use of batched matmuls. fuse multiply into matmul. breaks bias, mask in exchange for massive speedup.
…ghts_calc_fn, calc_fn_data) and unused vars
…ul for SD 2.1. but remove value float32, having established that it works without.

@Birch-san

@Birch-san

@Birch-san

@Birch-san

@Birch-san

@Birch-san

@Birch-san

@Birch-san

@Birch-san

@Birch-san

@Birch-san

@Birch-san

@Birch-san

@Birch-san

@Birch-san

@Birch-san

@Birch-san

@Birch-san

…to prefer fast-path whenever unchunked attention would fit into memory. add kv_chunk_size_min to control the kv_chunk_size=None behaviour, so that sqrt(key_tokens) does not pick too small of a chunk size

@Birch-san

…of chunk key size. improve separation of concerns.

@Birch-san

@Birch-san

@Birch-san

…al kv_chunk_size: they can notice when no chunking would happen at all, and use fast-path. note: there's a question of whether that concern belongs *inside* the algorithm. but it'd feel weird for chunked attention to have a no-chunking-at-all branch.
… equivalent fast-path for 1 query chunk, 1 kv chunk is already supported inside
…ything in one chunk, to re-use an existing fast-path.

@Birch-san

Birch-san

brkirch

Beinsezii added a commit to Beinsezii/diffusers that referenced this pull request

Feb 28, 2024

Beinsezii added a commit to Beinsezii/diffusers that referenced this pull request

Feb 28, 2024