A new training method called RLSD from JD.com and academic researchers aims to solve a common enterprise bottleneck: reasoning models are expensive to train and often get only sparse feedback. RLSD keeps reinforcement learning’s reliable direction while using self-distillation only for credit assignment, avoiding “privileged information leakage.” In tests, it beat baseline GRPO and standard OPSD with faster convergence.
Huawei says it will ramp up smart-driving R&D with more than $10 billion over the next five years, aiming to boost computing power for training. The move underscores the company’s push to accelerate AI model development for autonomous driving systems, betting that stronger compute capabilities will translate into faster progress and improved performance on next-generation driving technologies.
Your news, in seconds
Get the Beige app — every story in 60 words, updated hourly. Free on iOS & Android.
Swipe through stories, personalise your feed, and save articles for later — all on the app.