kv_cache | Modular

Python package

KV cache management for efficient attention computation during inference.

This package provides implementations for managing key-value caches used in transformer models. The paged attention implementation enables efficient memory management by fragmenting cache memory into pages, allowing for better memory utilization and support for prefix caching.

Functions

load_kv_manager: Load and initialize a KV cache manager.
available_port: Find an available TCP port for transfer engine communication.

Modules

registry: KV cache manager factory functions and utilities.

Packages

paged_kv_cache: Paged attention KV cache implementation.

Classes

PagedKVCacheManager: Manager for paged KV cache with data and tensor parallelism support.
KVTransferEngine: Manages KV cache transfers between devices in distributed settings.
KVTransferEngineMetadata: Metadata for KV cache transfer engine configuration.
TransferReqData: Data structure for KV cache transfer requests.

Functions​

Modules​

Packages​

Classes​

Functions

Modules

Packages

Classes