reinforcement learning

Sakana’s RL Conductor trains a tiny 7B model to orchestrate GPT Claude and Gemini at low cost

Sakana AI says its RL Conductor turns a small 7B model into an “orchestra conductor” that dynamically routes tasks across GPT-5, Claude Sonnet 4, Gemini 2.5 Pro, and open-source workers. Instead of hardcoded pipelines like LangChain, it learns coordination via reinforcement learning—cutting tokens and API calls while boosting reasoning and coding benchmark scores. The tech now powers Sakana Fugu’s enterprise API.

Venture Beat

·Published by Beige· on 8 May 2026

Summarised by Beize from a story on Venture Beat on 8 May 2026

Alibaba Metis slashes redundant AI tool calls from 98% to 2% while boosting reasoning accuracy

Alibaba researchers say their Metis agent, trained with HDPO reinforcement learning, cuts redundant tool use from 98% to 2% by teaching accuracy and efficiency as separate learning signals. The approach targets “trigger-happy” behavior that slows agents, inflates API costs, and injects noisy context. Metis also reaches top-tier reasoning and visual-document performance across benchmarks.

Venture Beat

·Published by Beige· on 30 Apr 2026

Summarised by Beize from a story on Venture Beat on 30 Apr 2026

Your news, in seconds

Get the Beige app — every story in 60 words, updated hourly. Free on iOS & Android.

App Store Play Store

RLSD lets enterprises train custom AI reasoning models with far less compute and better stability

A new training method called RLSD from JD.com and academic researchers aims to solve a common enterprise bottleneck: reasoning models are expensive to train and often get only sparse feedback. RLSD keeps reinforcement learning’s reliable direction while using self-distillation only for credit assignment, avoiding “privileged information leakage.” In tests, it beat baseline GRPO and standard OPSD with faster convergence.

Venture Beat

·Published by Beige· on 29 Apr 2026

Summarised by Beize from a story on Venture Beat on 29 Apr 2026

New AI framework ASI-EVOLVE autonomously upgrades data, architectures and learning rules surpassing human baselines

Researchers at SII-GAIR unveiled ASI-EVOLVE, an agentic AI-for-AI system that runs a continuous learn-design-experiment-analyze loop to optimize the full foundation-model stack. In tests it created novel linear attention architectures, improved pretraining pipelines, and designed reinforcement learning algorithms—boosting benchmark scores by up to 18-plus points—while reducing the need for constant human intervention.

Venture Beat

·Published by Beige· on 28 Apr 2026

Summarised by Beize from a story on Venture Beat on 28 Apr 2026

AI is reshaping chip design as generative systems cut floorplanning timelines from weeks to hours

A new wave of AI is moving into semiconductor chip design. Generative AI combined with reinforcement learning can optimize component placement during floorplanning, a task traditionally handled by expert engineers. The result: product-development cycles that once took weeks may now be compressed into hours, speeding prototypes and reducing time-to-market significantly.

The Economic Times

·Published by Beige· on 24 Apr 2026

Summarised by Beize from a story on The Economic Times on 24 Apr 2026

Page 1

reinforcement learning

Sakana’s RL Conductor trains a tiny 7B model to orchestrate GPT Claude and Gemini at low cost

Alibaba Metis slashes redundant AI tool calls from 98% to 2% while boosting reasoning accuracy

RLSD lets enterprises train custom AI reasoning models with far less compute and better stability

New AI framework ASI-EVOLVE autonomously upgrades data, architectures and learning rules surpassing human baselines

AI is reshaping chip design as generative systems cut floorplanning timelines from weeks to hours

The full experience is on mobile.