OpenAI adds GPT-5-class reasoning to real-time voice with modular models for orchestration

Published on 8 May 2026

Voice agents may stop needing costly state resets

OpenAI’s new GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper aim to cut the heavy engineering overhead behind voice agents. Rather than cramming reasoning, transcription, and translation into one system, OpenAI routes each task to specialized models, letting enterprises orchestrate more cleanly within a 128K context window. The shift could make voice agents cheaper and easier to scale.

GPT-Realtime-2 delivers GPT-5-class reasoning for smoother, harder voice requests
Translate and transcription are separated into dedicated orchestration primitives
Enterprises can route tasks to the right model instead of one all-in voice stack
Evaluations now focus on orchestration architecture and 128K state management

#voice agents #ai orchestration #gpt-5 #openai #speech to text

Read the full story at Venture Beat

This summarization was done by Beige for a story published on Venture Beat

OpenAI adds GPT-5-class reasoning to real-time voice with modular models for orchestration

The full experience is on mobile.