reasoning models

RLSD lets enterprises train custom AI reasoning models with far less compute and better stability

A new training method called RLSD from JD.com and academic researchers aims to solve a common enterprise bottleneck: reasoning models are expensive to train and often get only sparse feedback. RLSD keeps reinforcement learning’s reliable direction while using self-distillation only for credit assignment, avoiding “privileged information leakage.” In tests, it beat baseline GRPO and standard OPSD with faster convergence.

Venture Beat

·Published by Beige· on 29 Apr 2026

Summarised by Beize from a story on Venture Beat on 29 Apr 2026

Page 1

reasoning models

RLSD lets enterprises train custom AI reasoning models with far less compute and better stability

The full experience is on mobile.