← Latest news 
Google speeds up Gemma4 threefold with multi token prediction drafters
Technology
Published on 7 May 2026

A new decoding trick lets tokens be guessed ahead
Google says its Gemma4 model can run up to three times faster thanks to Multi-Token Prediction Drafters, an algorithmic upgrade to how text is decoded. Instead of committing token by token, the system drafts multiple likely continuations, reducing costly re-computation during generation. The result is quicker responses without changing the model’s core capabilities.
- Multi-Token Prediction Drafters aim to speed up decoding
- The approach drafts several likely next tokens at once
- Google claims up to three times faster generation
- Improves inference speed through algorithm, not hardware
Read the full story at Office Chai
This summarization was done by Beige for a story published on
Office Chai
