inference

Enterprise AI is wasting GPUs at 5%—and the $401 billion bill is finally coming due

Enterprises chasing the “GPU scramble” now face a harsh reality: enterprise average utilization is stuck near 5%, leaving GPUs as depreciating, idle CapEx. Gartner projects $401B in new AI infrastructure spending, but a shift is underway toward inference-first economics, tighter TCO, and governance that trusts the data powering agentic systems.

Venture Beat

·Published by Beige· on 8 May 2026

Summarised by Beize from a story on Venture Beat on 8 May 2026

Google speeds up Gemma4 threefold with multi token prediction drafters

Google says its Gemma4 model can run up to three times faster thanks to Multi-Token Prediction Drafters, an algorithmic upgrade to how text is decoded. Instead of committing token by token, the system drafts multiple likely continuations, reducing costly re-computation during generation. The result is quicker responses without changing the model’s core capabilities.

Office Chai

·Published by Beige· on 7 May 2026

Summarised by Beize from a story on Office Chai on 7 May 2026

Your news, in seconds

Get the Beige app — every story in 60 words, updated hourly. Free on iOS & Android.

App Store Play Store

Anthropic in talks to buy AI inference chips from UK startup Fractile for faster cheaper deployment

Anthropic is reportedly in talks to buy AI inference chips from UK startup Fractile, signaling a push to secure specialized hardware for running models at scale. The deal could help Anthropic improve performance, reduce inference costs, and strengthen supply options beyond its current chip partners, if negotiations lead to a finalized agreement.

The Economic Times

·Published by Beige· on 2 May 2026

Summarised by Beize from a story on The Economic Times on 2 May 2026

Page 1

inference

Enterprise AI is wasting GPUs at 5%—and the $401 billion bill is finally coming due

Google speeds up Gemma4 threefold with multi token prediction drafters

Anthropic in talks to buy AI inference chips from UK startup Fractile for faster cheaper deployment

The full experience is on mobile.