Google speeds up Gemma4 threefold with multi token prediction drafters

Published on 7 May 2026

A new decoding trick lets tokens be guessed ahead

Google says its Gemma4 model can run up to three times faster thanks to Multi-Token Prediction Drafters, an algorithmic upgrade to how text is decoded. Instead of committing token by token, the system drafts multiple likely continuations, reducing costly re-computation during generation. The result is quicker responses without changing the model’s core capabilities.

Multi-Token Prediction Drafters aim to speed up decoding
The approach drafts several likely next tokens at once
Google claims up to three times faster generation
Improves inference speed through algorithm, not hardware

#inference #ai models #nlp #google #gemma4

Read the full story at Office Chai

This summarization was done by Beige for a story published on Office Chai

Google speeds up Gemma4 threefold with multi token prediction drafters

The full experience is on mobile.