Summary OpenAI unveils three audio models for real-time voice tasks
(Reuters) - OpenAI introduced three audio models for its developer platform on Thursday, aiming to make voice-based software agents more conversational and capable of completing tasks in real time.
The launch of the application programming interface (API) moves the ChatGPT-maker beyond transcription and chat toward agents that can listen, translate and act during live conversations.
GPT-Realtime-2 is designed to manage harder requests, call tools, handle interruptions and maintain context across longer voice sessions.
The second model supports translation from more than 70 languages into 13 output languages, targeting customer support, education and other settings.
GPT-Realtime-Whisper provides live speech-to-text, allowing captions, meeting notes and workflow updates to be generated as a speaker talks.
