Download the app
← Latest news

Google speeds up Gemma4 threefold with multi token prediction drafters

Technology
Published on 7 May 2026
Google speeds up Gemma4 threefold with multi token prediction drafters

A new decoding trick lets tokens be guessed ahead

Google says its Gemma4 model can run up to three times faster thanks to Multi-Token Prediction Drafters, an algorithmic upgrade to how text is decoded. Instead of committing token by token, the system drafts multiple likely continuations, reducing costly re-computation during generation. The result is quicker responses without changing the model’s core capabilities.

  • Multi-Token Prediction Drafters aim to speed up decoding
  • The approach drafts several likely next tokens at once
  • Google claims up to three times faster generation
  • Improves inference speed through algorithm, not hardware
Read the full story at Office Chai

This summarization was done by Beige for a story published on Office ChaiOffice Chai

The full experience is on mobile.

Swipe through stories, personalise your feed, and save articles for later — all on the app.