Nvidia’s new technique cuts LLM reasoning costs by 8x without losing accuracy

Ben Dickson February 12, 2026 Image credit: VentureBeat with Nano Banana ProResearchers at Nvidia have developed a technique that can reduce the memory costs of large language model reasoning by up to eight times. Their technique, called dynamic memory sparsification (DMS), compresses the key value (KV) cache, the temporary memory LLMs generate and store as they process prompts and reason through problems and documents.While researchers have proposed various methods to compress this cache before, most struggle to do so without degrading the model’s intelligence. Nvidia’s approach manages to discard much of the cache while maintaining (and in some cases improving) the model’s reasoning capabilities.Experiments show that DMS enables LLMs to “think” longer and explore more solutions…

Read more on VentureBeat