OpenAI rolls out real-time speech translation supporting 70+ input languages
New update allows applications to 'listen, reason, translate, transcribe, and take action' during live conversations
OpenAI has introduced a major expansion of its developer offerings with new voice intelligence features in its API, aiming to make artificial intelligence (AI) more interactive through real-time speech, translation, and transcription capabilities.
Announced on Thursday, the update brings a suite of tools designed to allow applications to “listen, reason, translate, transcribe, and take action” during live conversations, marking a shift from traditional text-based AI interactions toward more natural voice-driven systems.
At the center of the release is GPT-Realtime-2, a new voice model that enables realistic, conversational audio responses.
OpenAI said the model is built on GPT-5-class reasoning, allowing it to handle more complex user requests compared to its earlier version, GPT-Realtime-1.5.
The company also unveiled GPT-Realtime-Translate, a real-time translation system designed to keep up with natural speech flow.
It supports more than 70 input languages and 13 output languages, allowing users to engage in multilingual conversations with minimal delay.
In addition, OpenAI launched GPT-Realtime-Whisper, a live speech-to-text tool that transcribes conversations as they happen, expanding on the company’s earlier speech recognition technology.
Together, the new tools are integrated into OpenAI’s Realtime API and are intended for use across industries such as customer service, education, media, live events, and creator platforms.
Developers can use the system to build applications that interact with users in real time through voice rather than text alone.
The company also highlighted safety measures built into the system, noting that guardrails have been added to help prevent misuse.
OpenAI said conversations can be automatically halted if harmful or policy-violating content is detected, aiming to reduce risks such as spam, fraud, or abusive behavior.
Pricing for the new tools varies: GPT-Realtime-2 is billed based on token usage, while translation and transcription features are charged per minute.