OpenAI’s new GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper aim to cut the heavy engineering overhead behind voice agents. Rather than cramming reasoning, transcription, and translation into one system, OpenAI routes each task to specialized models, letting enterprises orchestrate more cleanly within a 128K context window. The shift could make voice agents cheaper and easier to scale.
OpenAI has unveiled three new audio models for developers aimed at making voice agents faster, smarter, and more interactive in real time. GPT-Realtime-2 tackles complex requests even when users interrupt. GPT-Realtime-Translate delivers live multilingual translation, while GPT-Realtime-Whisper provides instant speech to text for captions and notes. Early adopters include companies like Zillow and Priceline.
Your news, in seconds
Get the Beige app — every story in 60 words, updated hourly. Free on iOS & Android.
Swipe through stories, personalise your feed, and save articles for later — all on the app.