AI that will destroy society?

OpenAI unveils Voice Engine: an AI capable of cloning any voice in 15 seconds. This tool is impressive, but it also presents huge risks…so much so that the brand is reluctant to release it to the general public! Find out why…

The rise of DeepFakes and the dangers that come with it are not going to fade away… quite the opposite. After ChatGPT, DALL-E or even Sora, OpenAI has just unveiled its new AI: Voice Engine.

This revolutionary tool is capable clone any human voice, from a simple 15 second sample. She creates a synthetic copy of your voice.

Developed on the sly for two years, it is a extend the convert-to-speech API already existing company. However, considering the enormous risks involved, OpenAI avoids releasing it into the open air

An AI was secretly developed for 2 years

According to OpenAI's Jeff Harris, the Creative AI model behind the Voice Engine it's actually the same one that ChatGPT uses for its voice functions. It's also hidden behind the text-to-speech API preset voices.

Furthermore, after confidential cooperation, Spotify has been using it since September 2023 to create dub from the biggest podcasters in all languages. This concerns, for example, Lex Friedman.

About where the training data comes from, OpenAI declines to go into detail. The company would only say that the model was trained on a mix of publicly available and licensed data.

Apparently, it took one huge number of voice recordings as examples to teach this AI to clone any voice. In any case, it is impossible to actually verify where they come from.

In the past, OpenAI has already been accused of stealing other people's intellectual property to train its AI. This includes photos, artwork, e-books or copyrighted lines of code.

In a recent statement to the UK House of Lords, the company admitted that it is ” impossibility » building AI models useful without copyrighted content…

How does the Voice Engine clone voices so perfectly?

To create voices, this AI combines the diffusion process (like Stable Diffusion) and the Technology transformation (like GPT).

ONE sample audio and text combined to generate speech that corresponds to the speaker's voice.

The sample data and the text to be read orally are analyzed simultaneously to create a single voice without having to create a custom template for each user.

This the technology is not new. Several startups have already been offering voice cloning software for several years, such as ElevenLabs, Papercup, Deepdub or Respeecher.

Even the tech giants like Amazon, Google and Microsoft develop such technologies. However, The Voice Engine has reached a new level in terms of quality.

The end of human voice actors?

This new AI represents above all threat to the voice dubbing industry. Many workers in this field are at risk of being completely replaced by faster, cheaper artificial intelligence.

Companies and professionals know this, and they have already started to prepare. In 2023, Replica Studios signed an agreement with SAG-AFTRA to create voice replicas of members of the media artists union.

According to both organizations, this partnership has created a level playing field make sure the voice actors give their consent. Their synthetic voices can be used for new projects, particularly in the field of video games.

From his side, ElevenLabs created a marketplace for synthesized voices. Users can create a voice, verify it, and then share it publicly.

Every time another user uses a voice, the creator gets paid. He can earn one dollar for every 1000 characters read.

However, OpenAI does not intend to adopt such a strategy. The only condition will be that users receive the express consent of people whose voices are cloned and clearly state that they are generated by AI.

Moreover, it will be the use of minor, dead voices is prohibited or political figures. The San Francisco company says “ very curious ” with the idea to observe how this new product will affect the voice dubbing industry…

A Pandora's box that should definitely not be opened

Beyond an employment risk, Voice Engine specifically introduces terrible security risks. For this OpenAI has not yet set a release datein order to determine how to avoid abuse.

Already in the past, forum users 4chan used the ElevenLabs platform to broadcast hate messages imitating the voices of celebrities like Emma Watson.

In one test, journalist James Vincent of The Verge was able to exploit artificial intelligence tools to clone voices and create violent threats. or even racist and transphobic comments.

Another journalist, deputy reporter Joseph Cox, created a voice clone convincing enough to trick a bank's authentication system.

With a tool as powerful as Voice Engine, there is concern that criminals will use AI destabilize the presidential election. As early as January 2024, a phone campaign used a deepfake of Joe Biden to prevent New Hampshire citizens from voting.

In order to prevent such abuses, OpenAI has restricted access to the Voice Engine to only ten of developers Currently.

The company it also prioritizes the least risky use cases and most of them socially beneficial “. This includes special applications in in the field of health and accessibilityor even in the media.

Among the early testers, for example, is the company Age of Learning specializing in educational technology. He uses the tool to create voice dubs of actors he has used.

From his side, the HeyGen storytelling app leverages this AI for translation. As for Livox and Lifespan, they use the Voice Engine to create voice for mute or disabled.

Let's also mention Dimagi, which creates a tool based on this AI to giving feedback to healthcare workers in their native languages.

In addition, clones were generated with The Voice Engine will have a watermark. A system created by OpenAI makes integration possible non-auditory identifiers in recordings. The company plans to make this system open source so that all similar tools can use it.

Next, OpenAI also plans to allow its network of expert security testers to access the Voice Engine assess risks and opportunities to cure.

Unfortunately, it seems inexorable that people will find ways bypass these security barriers and imagine misuse cases which OpenAI wouldn't even have thought of…

Price and release date

According the progress of the preview and the risks identifiedOpenAI will decide whether or not to roll out the tool to more developers and possibly the general public.

Currently, the company prefers not to commit to anything. He is currently testing safety mechanism aiming to force users to read random text to ensure they are present and aware of how their voice is being used.

OpenAI's priority is to to ensure people don't get confused human and AI-generated voices, and only if this condition is met will the Voice Engine be released.

According to marketing documents briefly released by the company, we do know, however, that the Voice Engine will be charged $15 per million characters or for about 162,500 words.

This represents about 18 hours of audio or about $1 per hour. It is much cheaper than the competition, with for example a price of its own $11 per month for 100,000 characters on ElevenLabs.

Are also significantly cheaper than human voice actorswith rates ranging from $12 to $79 per hour on ZipRecruiter.

However, it won't be cannot adjust pitch, rate, or pitch voice like other services offer. ONE However, the HD option will be offered at double the pricebut we don't know details at the moment…

And you, what do you think of Voice Engine? Should we leave such a tool available to the general public or is it undeniably too dangerous? Share your opinion in the comments!

    Share this article:


Our blog is powered by readers. When you purchase through links on our site, we may earn an affiliate commission.

Leave a Reply

Your email address will not be published. Required fields are marked *