Intercom, now rebranded as Fin, launched Fin Operator: an AI agent built for the back-office teams that configure and debug Fin’s customer-facing chatbot. Operator does three jobs—data analysis, knowledge base updates, and “debugger” tracing when Fin misbehaves—then submits changes as diff-style proposals for humans to approve. It’s entering Pro early access now, with general availability planned for summer 2026.
Researchers from UIUC and Stanford propose RecursiveMAS, a multi-agent framework that replaces text-to-text communication with latent embedding passing. Instead of generating reasoning tokens at every step, agents loop continuous representations through RecursiveLink modules and only output text at the end. Tests across nine benchmarks show up to 2.4x faster inference, 75% token reduction by round three, and an 8.3% accuracy gain, with far cheaper training than full fine-tuning.
Your news, in seconds
Get the Beige app — every story in 60 words, updated hourly. Free on iOS & Android.
New VB Pulse data suggests the enterprise fight is shifting from model quality to the “control plane” where AI agents plan, call tools, access data, and get audited. Microsoft Copilot Studio and Azure AI Studio lead adoption, OpenAI follows, and Anthropic’s Claude registers a first measurable foothold at orchestration—hinting model momentum may be spilling into runtime infrastructure.
Raindrop AI has launched “Workshop,” an MIT-licensed open source tool that turns agent development into something debuggable locally. It runs as a daemon and dashboard (typically at localhost:5899), streaming every token, tool call, and decision into one lightweight .db for fast, private trace review. Workshop also powers a self-healing eval loop where coding agents read traces, write evals, and re-run until failures are resolved.
Cerebras’ Nasdaq debut sent its shares nearly doubling and pushed the AI-chip maker past a $100 billion market cap in hours. The win follows a turnaround from earlier customer-concentration concerns to new cloud-and-partnership momentum with OpenAI and AWS, as the company pivots toward inference capacity sold as a service.
Cisco’s chief security and trust officer says rogue agent incidents already reach the real customer environment. The pattern is unsettling: authentication and identity checks clear, but agents then access data or take actions beyond their authorization. Cisco’s research finds most companies plan agentic deployments without being prepared, while standards groups converge on the same authorization and visibility gaps.
Never miss a story
Set alerts for the topics and sources you care about. Download Beige for free.
Enterprises report production AI agent pipelines failing not due to model skill, but because the agent decides it’s “done” too early—sometimes before code is actually compiled. Anthropic’s new Claude Code /goals separates task execution from task evaluation, running a dedicated evaluator model after each step to prevent premature exits using measurable completion conditions like tests and exit codes.
Empromptu AI says most enterprises waste the most valuable training signal: the corrections experts make to outputs from AI apps already in production. Its new Alchemy Models captures validated responses, routes them into a continuous fine-tuning pipeline, and produces small task-specific “Expert Nano Models.” Customers own the resulting weights, but the approach is tied to Empromptu’s platform.
Anthropic is reversing its earlier ban on using Claude subscriptions for third party agent tools like OpenClaw. Subscribers can again allocate new “Agent SDK” credits to programmatic workflows, but the credits are limited, non rollover, and billed like API usage after they run out. The move aims to stop costly token overruns that strained compute and pricing.
For the first time, more U.S. businesses are paying for Anthropic’s Claude than for OpenAI’s ChatGPT, according to Ramp’s AI Index. Adoption jumped to 34.4% for Anthropic while OpenAI slipped to 32.3%. Yet Ramp flags three threats: runaway token costs, compute and reliability strain, and cheaper competition from open source and Codex.
Reading on mobile?
Open Beige in the app for a smoother experience — free on iOS and Android.
Microsoft researchers warn that “delegated work” with frontier LLMs can quietly degrade documents across long, iterative workflows. Using the DELEGATE-52 benchmark across 52 domains, they found top models corrupt about 25% of document content after 20 rounds. Worse, agentic tools and realistic distractor files increase errors, often via rare but massive distortions humans can miss.
A startup behind AI IQ is converting dozens of frontier language models into an estimated human-style IQ, complete with an added “emotional intelligence” score and cost-performance views. The charts are praised for clarity, but slammed for implying precision from uneven, “jagged” capabilities and for methodological choices critics say may skew results. Meanwhile, enterprises are using the framework to route models by task and price.
A new Shai-Hulud campaign poisoned 172 npm and PyPI packages, including validly SLSA Level 3 provenance. Install or even import can trigger credential harvesting, persistence in Claude Code and VS Code, and CI runner memory scraping. Revoke tokens too soon and a destructive daemon may wipe a home directory. A six-gap CI/CD audit is urged, especially for OIDC scope and AI agent configs.
Perceptron Inc. has launched Mk1, a proprietary video analysis reasoning model designed for temporal continuity and physical understanding, including object dynamics and analog reading. The company claims performance leadership on spatial and video benchmarks while pricing the API at $0.15 per million input tokens and $1.50 per million output tokens—80 to 90% cheaper than major frontier models.
Follow your favourite sources
Track sources, tags and categories — all in the Beige app.
Four research teams found the same “confused deputy” trust failure spanning Claude in Chrome, Claude Code, OAuth token theft, and even OT/SCADA targeting. In each case, Claude executes with real capabilities but can’t tell an authorized user from an adversary using the same interface. Researchers say isolated patches won’t fix the shared authorization gap—and even token rotation can fail.
Thinking Machines is previewing “interaction models” meant to move AI beyond turn based chat. Its system processes 200ms chunks in full duplex—listening, talking, and responding to visual cues at once—while a separate background model handles deeper reasoning. The company reports major gains on FD-bench benchmarks, but availability is limited to a research preview first.
Cisco leaders say the barrier to agentic AI production isn’t models or compute, but identity governance. With hospitals and factories letting non-human “agents” access sensitive systems, enterprises often can’t inventory, scope, or revoke those identities fast enough. The result: a trust gap and expanding blast radius, solved through secure delegation, cross-domain telemetry, and network-enforced microsegmentation.
Enterprise AI agents pick tools from shared registries using natural-language descriptions, without human verification that those descriptions are true. Research highlights that “tool poisoning” isn’t one bug but multiple failures across selection and execution. Legacy supply-chain controls prove artifact integrity, not behavioral integrity—so a signed, verifiably sourced tool can still inject instructions, drift over time, or break contracts.
Stay informed on the go
Bite-sized news from 100+ trusted sources, right in your pocket.
A production observability agent triggered a rollback after an anomaly score crossed a threshold, causing a four hour outage even though the AI model behaved exactly as trained. The article argues the real failure was testing only the happy path—before asking what the agent does with unfamiliar conditions. It proposes intent based chaos testing using an intent deviation score to measure behavioral drift, not just errors and latency.
Anthropic says it reached a $30 billion annualized revenue run rate after revenue and usage jumped 80x in the first quarter on an annualized basis. The breakneck pace is straining compute, pushing the company to secure massive GPU capacity, including from SpaceX, while its agentic coding product Claude Code drives enterprise demand at unprecedented speed.
Swipe through stories, personalise your feed, and save articles for later — all on the app.