Transformers

OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization Under Optimal Squared Error Quantization featured image

OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization Under Optimal Squared Error Quantization

A rotation-preconditioned KV cache codec that jointly quantizes coordinate triplets via an octahedral map, achieving state-of-the-art compression across text, video, and audio …

avatar
Mark Boss