VentureBeat Feb 23 Researchers baked 3x inference speedups directly into LLM weights — without speculative decoding