OpenAI’s new app, based on a system called GPT-4o, can be used on phones and desktop computers. Photo / Getty Images
As Apple and Google transform their voice assistants into chatbots, OpenAI is transforming its chatbot into a voice assistant.
The San Francisco artificial intelligence startup has unveiled a new version of its ChatGPT chatbot today
(NZ time) that can receive and respond to voice commands, images and videos.
The company said the new app, based on an AI system called GPT-4o, juggled audio, images and video significantly faster than the previous version of the technology. The app is available from today, free of charge, for smartphones and desktop computers.
“We are looking at the future of the interaction between ourselves and machines,” Mira Murati, the company’s chief technology officer, said.
The new app is part of a wider effort to combine conversational chatbots such as ChatGPT with voice assistants such as the Google Assistant and Apple’s Siri. As Google merges its Gemini chatbot with the Google Assistant, Apple is preparing a new version of Siri that is more conversational.
OpenAI said it would gradually share the technology with users “over the coming weeks”. This is the first time it has offered ChatGPT as a desktop application.
It previously offered similar technologies from inside various free and paid products. Now, it has rolled them into a single system available across all its products.
During an event streamed on the internet, Murati and her colleagues showed off the new app as it responded to conversational voice commands, used a live video feed to analyse maths problems written on a sheet of paper and read aloud playful stories it had written on the fly.
The new app cannot generate video. But it can generate still images that represent frames of a video.
OpenAI previously offered a version of ChatGPT that could accept voice commands and respond with voice. But it was a patchwork of three different AI technologies: one that converted voice to text, one that generated a text response and one that converted this text into a synthetic voice.
The new app is based on a single technology, GPT-4o, that can accept and generate text, sounds and images. This meant the technology was more efficient and the company could afford to offer it to users for free, Murati said.
Written by: Cade Metz
© 2024 THE NEW YORK TIMES