Sakana AI says its RL Conductor turns a small 7B model into an “orchestra conductor” that dynamically routes tasks across GPT-5, Claude Sonnet 4, Gemini 2.5 Pro, and open-source workers. Instead of hardcoded pipelines like LangChain, it learns coordination via reinforcement learning—cutting tokens and API calls while boosting reasoning and coding benchmark scores. The tech now powers Sakana Fugu’s enterprise API.
Alibaba researchers say their Metis agent, trained with HDPO reinforcement learning, cuts redundant tool use from 98% to 2% by teaching accuracy and efficiency as separate learning signals. The approach targets “trigger-happy” behavior that slows agents, inflates API costs, and injects noisy context. Metis also reaches top-tier reasoning and visual-document performance across benchmarks.
Your news, in seconds
Get the Beige app — every story in 60 words, updated hourly. Free on iOS & Android.
A new training method called RLSD from JD.com and academic researchers aims to solve a common enterprise bottleneck: reasoning models are expensive to train and often get only sparse feedback. RLSD keeps reinforcement learning’s reliable direction while using self-distillation only for credit assignment, avoiding “privileged information leakage.” In tests, it beat baseline GRPO and standard OPSD with faster convergence.
Researchers at SII-GAIR unveiled ASI-EVOLVE, an agentic AI-for-AI system that runs a continuous learn-design-experiment-analyze loop to optimize the full foundation-model stack. In tests it created novel linear attention architectures, improved pretraining pipelines, and designed reinforcement learning algorithms—boosting benchmark scores by up to 18-plus points—while reducing the need for constant human intervention.
A new wave of AI is moving into semiconductor chip design. Generative AI combined with reinforcement learning can optimize component placement during floorplanning, a task traditionally handled by expert engineers. The result: product-development cycles that once took weeks may now be compressed into hours, speeding prototypes and reducing time-to-market significantly.
Swipe through stories, personalise your feed, and save articles for later — all on the app.