Multimodal digital assistant OpenAI could launch soon

Edgar Cervantes / Android Authority TL;DR On Monday, OpenAI is hosting an event that could see the announcement of a new multimodal digital assistant. Being multimodal would allow the assistant to use images for prompts, such as identifying and translating a sign in the real world. This would pose a direct threat to Google’s digital […]

News May 13, 2024 28 Add to Reading List

Multimodal digital assistant OpenAI could launch soon

Edgar Cervantes / Android Authority

TL;DR

On Monday, OpenAI is hosting an event that could see the announcement of a new multimodal digital assistant.
Being multimodal would allow the assistant to use images for prompts, such as identifying and translating a sign in the real world.
This would pose a direct threat to Google’s digital assistants, namely Google Assistant and the new Gemini.

Over the past few weeks, rumors have swirled suggesting that OpenAI – the company behind ChatGPT – could soon launch an AI-powered search engine, posing a direct threat to Google’s core business. Given how much ChatGPT has grown in such a short time, this would represent the first real threat to Google Search in decades.

However, it seems less likely that OpenAI has a search engine on the way (via Information). Instead, new rumors suggest that OpenAI’s planned event on Monday could see the company announce a multimodal digital assistant. Although it is not a traditional search engine, it would still allow users to search for things using the power of AI, which would still pose a significant threat to Google.

Multimodal means the AI can handle multiple input forms, not just text. In the case of this digital assistant, it would be able to connect to a camera, process real-world information, and then respond to you with more information about what it sees. For example, you can point a camera at a sign in a different language and ask ChatGPT to identify and translate the sign for you, and the AI will speak to you in response.

If this sounds familiar, that’s because it’s something that Google Lens, Google Assistant, and, more recently, Google Gemini already do. In fact, ChatGPT can already do this too, but not through a single interface. In other words, Monday’s launch could see the company announce an improved GPT model that offers faster, more accurate responses with both image input and sound responses bundled into one app. In other words, a direct competitor to Gemini (and, subsequently, Google Assistant and Apple’s Siri).

To be clear, it certainly wouldn’t be GPT-5, the long-awaited sequel to GPT-4 and GPT-4 Turbo. The company indicated that GPT-5 would not come to this event. Information suggests it won’t land until late 2024.

Do you have any advice? Talk to us! Email our staff at news@androidauthority.com. You can remain anonymous or get credit for the information, it’s your choice.