Enterprises chasing the “GPU scramble” now face a harsh reality: enterprise average utilization is stuck near 5%, leaving GPUs as depreciating, idle CapEx. Gartner projects $401B in new AI infrastructure spending, but a shift is underway toward inference-first economics, tighter TCO, and governance that trusts the data powering agentic systems.
Google says its Gemma4 model can run up to three times faster thanks to Multi-Token Prediction Drafters, an algorithmic upgrade to how text is decoded. Instead of committing token by token, the system drafts multiple likely continuations, reducing costly re-computation during generation. The result is quicker responses without changing the model’s core capabilities.
Your news, in seconds
Get the Beige app — every story in 60 words, updated hourly. Free on iOS & Android.
Anthropic is reportedly in talks to buy AI inference chips from UK startup Fractile, signaling a push to secure specialized hardware for running models at scale. The deal could help Anthropic improve performance, reduce inference costs, and strengthen supply options beyond its current chip partners, if negotiations lead to a finalized agreement.
Swipe through stories, personalise your feed, and save articles for later — all on the app.