Accelerating Gemma 4: faster inference with multi-token prediction drafters : US Pioneer Global VC DIFCHQ SFO NYC Singapore – Riyadh Swiss Our Mind

By using Multi-Token Prediction (MTP) drafters, Gemma 4 models reduce latency bottlenecks and achieve improved responsiveness for developers.

Gemma 4 (MTP) drafter speed ups

Tokens-per-second speed increases, tested on hardware using LiteRT-LM, MLX, Hugging Face Transformers, and vLLM.

Gemma 4 26B on a NVIDIA RTX PRO 6000. Standard Inference (left) vs. MTP Drafter (right) in tokens per second. Same output quality, half the wait time.