OpenAI unveils three audio models for real-time voice tasks

OpenAI unveils three audio models for real-time voice tasks
Updated on

Summary OpenAI unveils three audio models for real-time voice tasks

(Reuters) - OpenAI introduced three audio models for its developer platform on Thursday, aiming to make voice-based software agents more conversational and capable of completing tasks in real time.

The launch of the application programming interface (API) moves the ChatGPT-maker beyond transcription and chat toward agents that can listen, translate ⁠and act during live conversations.

GPT-Realtime-2 is designed to manage harder requests, call tools, handle interruptions and maintain context across longer voice sessions.

The second model supports translation from ⁠more than 70 languages into 13 output languages, targeting customer support, education and other settings.

GPT-Realtime-Whisper provides live speech-to-text, allowing captions, meeting notes and workflow ⁠updates to be generated as a speaker talks.


 

Browse Topics