embeddings

RecursiveMAS cuts multi-agent latency by using embeddings only, delivering 2.4x speed and 75% fewer tokens

Researchers from UIUC and Stanford propose RecursiveMAS, a multi-agent framework that replaces text-to-text communication with latent embedding passing. Instead of generating reasoning tokens at every step, agents loop continuous representations through RecursiveLink modules and only output text at the end. Tests across nine benchmarks show up to 2.4x faster inference, 75% token reduction by round three, and an 8.3% accuracy gain, with far cheaper training than full fine-tuning.

Venture Beat

·Published by Beige· on 16 May 2026

Summarised by Beize from a story on Venture Beat on 16 May 2026

RAG fine tuning may cut retrieval accuracy by 40% and break agent decisions at scale

Redis research warns that fine-tuning RAG embedding models for “compositional sensitivity” can quietly harm general retrieval, dropping accuracy up to 40% on production mid-size models. The issue: structural meaning shifts like negation and role reversals can end up near-identical in embedding space, while common fine-tuning metrics miss it. Agentic pipelines are especially vulnerable.

Venture Beat

·Published by Beige· on 28 Apr 2026

Summarised by Beize from a story on Venture Beat on 28 Apr 2026

Your news, in seconds

Get the Beige app — every story in 60 words, updated hourly. Free on iOS & Android.

App Store Play Store

Page 1

embeddings

RecursiveMAS cuts multi-agent latency by using embeddings only, delivering 2.4x speed and 75% fewer tokens

RAG fine tuning may cut retrieval accuracy by 40% and break agent decisions at scale

The full experience is on mobile.