OpenAI is trying to steal Scarlett Johansson’s voice to make the AI feel ‘comfortable’, that’s why it’s so worrying

What do you want to know Scarlett Johansson says she was approached by OpenAI last year to use her voice for a ChatGPT voice assistant. Although Johansson did not accept the proposal, OpenAI delivered its GPT-4o model with a voice called “Sky” that closely resembles Johansson’s. After legal pressure, OpenAI removed Sky from GPT-4o and […]

Technology May 23, 2024 25 Add to Reading List

OpenAI is trying to steal Scarlett Johansson’s voice to make the AI feel ‘comfortable’, that’s why it’s so worrying

What do you want to know

Scarlett Johansson says she was approached by OpenAI last year to use her voice for a ChatGPT voice assistant.
Although Johansson did not accept the proposal, OpenAI delivered its GPT-4o model with a voice called “Sky” that closely resembles Johansson’s.
After legal pressure, OpenAI removed Sky from GPT-4o and said the voice was not based on Johansson’s.
Yet in trying to use a friendly, welcoming voice to make AI more comforting, OpenAI ended up doing the opposite.

OpenAI made waves last week by announcing GPT-4o, a multimodal AI model that could be the most advanced and futuristic we’ve seen yet. It looks like a human, can interact with users via vision and audio, and is knowledgeable. OpenAI ended up beating Google, and GPT-4o appears more advanced than Project Astra, which Google previewed at Google I/O 2024.

But one of OpenAI’s chosen voices for GPT-4o has attracted attention online for all the wrong reasons. First, some social media users pointed out that they thought the “Sky” voice was too seductive and sensual to the point that it was unsettling. Then people began to notice the similarities between Sky’s voice and that of Scarlett Johansson, the award-winning actress. Now it appears this may have been intentional.

To be clear, OpenAI denies that Sky’s voice is based on Johansson and has even published a blog post explaining how the voices were chosen. However, Johansson released a scathing statement recounting how OpenAI approached her to officially express GPT-4o, which she refused. After facing legal pressure from Johansson’s lawyers, the company removed GPT-4o’s Sky voice option.

As distressing as this situation is, it is almost ironic. OpenAI CEO Sam Altman told Johansson that his voice, being the official voice of ChatGPT, would be more comforting to users. And yet, by publishing a voice so similar to Johansson’s without his permission, Altman and OpenAI ended up perfectly encapsulating everything that makes people uncomfortable about AI.

Did OpenAI steal Scarlett Johansson’s voice?

(Image credit: Jay Bonggolto / Android Central)

Although OpenAI claims to have sought professional actors for GPT-4o and not sought someone who specifically looked like Johansson, the evidence might tell a different story. According to Johansson, it starts in September 2023, when OpenAI’s Altman asked her to hire her as a voice actor for ChatGPT.

“He told me he thought that by giving voice to the system, I could bridge the gap between tech companies and creatives and help consumers feel comfortable with the seismic shift regarding humans and Al ” she said in a statement to NPR’s Bobby Allyn. “He said he thought my voice would comfort people.”

Johansson ultimately decided not to express GPT-4o. However, it’s easy to hear his resemblance in Sky’s voice which ended up being demoed and shipped with the AI model. To say Johansson was unhappy with the result would be an understatement.

Statement from Scarlett Johansson on the OpenAI situation. Wow: pic.twitter.com/8ibMeLfqP8May 20, 2024

“We believe that AI voices should not deliberately imitate the distinctive voice of a celebrity. Sky’s voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural voice,” OpenAI said in a blog post.

The only reason OpenAI wanted a voice like Johansson’s, as Altman reportedly told it, is to make AI more comforting. People may be more afraid of AI than they are excited about it. Those in the creative industries in particular are discovering that AI is being used to automate writing, visual arts, music and other media. This isn’t something unique to OpenAI: Apple recently came under fire and apologized for an ad that literally saw instruments crushed into pieces and replaced with an iPad.

By using his image in a GPT-4o voice without his permission – whether intentionally or not – OpenAI ended up validating the discomfort associated with AI that it was desperately trying to solve. Creatives, from actors and actresses to writers and photographers, fear being replaced by AI. The idea that OpenAI might have mimicked Johansson’s voice for GPT-4o is exactly the sort of thing that has people in the creative industries worried and concerned.

“When I heard the demo released, I was shocked, angry and in disbelief that Mr. Altman was pursuing a voice that sounded so eerily like mine that my closest friends and the media couldn’t tell the difference.” , Johansson wrote, explaining that she asked OpenAI to show how it developed the Sky voice. “At a time when we are all grappling with deepfakes and protecting our own image, our own work, our own identity, I think these are issues that deserve absolute clarity.”

We shouldn’t want AI to look so human

Use Google Gemini on Android — (Image credit: Future)

Aside from the troubling idea that a company could rip off an actress’ voice after disagreeing with a deal, there are other reasons why we don’t want AI voices to sound like Sky’s . All GPT-4o voices from OpenAI, and especially Sky, sound very human. This is a problem because people have a high level of confidence and familiarity with human voices. When you talk to a voice assistant like Siri or Alexa, it’s clear that you’re talking to – for lack of a better word – a robot. After having a conversation with GPT-4o, this level of clarity will not always be present in your mind.

Currently, AI models have a problem: They confidently state their answers as facts, even when they are demonstrably false. People always end up believing that the AI’s answers are true despite the series of warnings that accompany them. As AI model voices become more human-like, this problem will only get worse. It will be easy for the average user of an AI tool to believe what is being said thanks to the welcoming human voice it uses.

In trying to make people more comfortable with the future of AI, OpenAI ended up feeling more dystopian. We shouldn’t want AI to appear as human as GPT-4o, and there are many reasons for that. This could foster an unwarranted level of trust between users and AI models, and put creatives like Johansson in a precarious position.

Teknory