kv_cache | Modular

Python package

KV cache management for efficient attention computation during inference.

This package provides implementations for managing key-value caches used in transformer models. The paged attention implementation enables efficient memory management by fragmenting cache memory into pages, allowing for better memory utilization and support for prefix caching.

Functions

Modules

  • registry: KV cache manager factory functions and utilities.

Packages

Classes