Google says its Gemma4 model can run up to three times faster thanks to Multi-Token Prediction Drafters, an algorithmic upgrade to how text is decoded. Instead of committing token by token, the system drafts multiple likely continuations, reducing costly re-computation during generation. The result is quicker responses without changing the model’s core capabilities.
DeepSeek V4 has arrived in two versions: a powerful Pro model with 1.6 trillion parameters and an efficient Flash variant. The headline feature is a one-million-token context window, enabling far longer and more complex prompts. With aggressive performance gains and pricing momentum, the question is whether the rapid push can be sustained against fast-moving competition.
Your news, in seconds
Get the Beige app — every story in 60 words, updated hourly. Free on iOS & Android.
Swipe through stories, personalise your feed, and save articles for later — all on the app.