Why is OpenAI delaying the release of its AI that mimics human voices?

If there is only a 15-second sample, it can be almost identically imitated
Approach to release cautiously... "Concerns about potential misuse"

OpenAI has developed and released a new artificial intelligence (AI) tool called the ‘Voice Engine.’ It is a tool that mimics human voices to create similar-sounding speech.

On the 29th (local time), OpenAI published preliminary experimental results of the Voice Engine in a blog post titled “Exploring the Challenges and Opportunities of Synthetic Voices.”

OpenAI explained, “We first developed the Voice Engine at the end of 2022 and used it to enhance voice features in ChatGPT’s speech recognition and reading functions, as well as in the text-to-speech conversion API (application programming interface).”

They added, “To explore the potential applications of this technology, we began private testing with a trusted small group from the end of last year and were deeply impressed by the applications developed by this group.”

OpenAI stated, “With just a 15-second voice sample, it is possible to generate speech that sounds similar to the original speaker’s voice.” In fact, the voice samples OpenAI released that day and the voices generated by the Voice Engine were so similar that it was difficult to distinguish between them.

Why is OpenAI delaying the release of its AI that mimics human voices?

[Image source=Yonhap News]

However, Bloomberg News pointed out, “OpenAI’s release of a feature that can imitate human voices opens a new frontier in AI technology,” but also noted, “This raises concerns about the risks of deepfakes (AI-generated manipulated videos, images, or audio).”

In January, in the United States, just one day before the New Hampshire primary, a fake phone call impersonating President Joe Biden was made to residents using voice manipulation, encouraging them to refuse to vote.

Because of this, OpenAI is taking a cautious approach regarding the full release of this tool, considering its potential risks. The company stated, “We are approaching this carefully due to the possibility of misuse of synthetic voice features,” and “for now, we have decided only to preview this technology and not to release it widely.”

They continued, “We recognize that generating voices resembling human speech poses serious risks, especially in election years,” and emphasized, “We are working with various sectors including governments in the U.S. and abroad, media, entertainment, education, and civil society to incorporate their feedback.”

OpenAI sees potential for this tool to be used positively across various fields in the future. Examples include voice narration content for children’s education, real-time personalized response generation, and translating content such as videos and podcasts into multiple languages. In fact, the voice translations by language that OpenAI released that day sounded almost identical to the speaker’s original native voice.

Additionally, OpenAI added, “There have also been cases where this technology was used in therapeutic applications for patients with language-affecting disorders or in communication devices for people with disabilities.”