gemma4

Google speeds up Gemma4 threefold with multi token prediction drafters

Google says its Gemma4 model can run up to three times faster thanks to Multi-Token Prediction Drafters, an algorithmic upgrade to how text is decoded. Instead of committing token by token, the system drafts multiple likely continuations, reducing costly re-computation during generation. The result is quicker responses without changing the model’s core capabilities.

Office Chai

·Published by Beige· on 7 May 2026

Summarised by Beize from a story on Office Chai on 7 May 2026

Page 1

gemma4

Google speeds up Gemma4 threefold with multi token prediction drafters

The full experience is on mobile.