OpenAI recently shared some preliminary results and insights from a preview of Voice Engine – the company’s voice cloning AI model that has been in development since 2022. The Voice Engine powers the Read Aloud feature in OpenAI’s hugely popular ChatGPT models and is also available as a text-to-speech API.
According to OpenAI, the Voice Engine tool has the capability to generate a synthetic but natural-sounding voice with just a 15-second clip of someone’s voice. While OpenAI has offered a preview of Voice Engine, it is holding back the release citing concerns about “the potential for synthetic voice misuse.”
The preview is meant to showcase Voice Engine’s capabilities. OpenAI has done some private testing with a small group of trusted partners. The small-scale deployments have allowed them to derive key insights about the potential use case of the application and the safeguards to prevent misuse.
One of the top use cases of the Voice Engine is to provide reading assistance using preset voices for non-readers and children. Age of Learning, an education technology company, is using the technology to create real-time, personalized responses to interact with students.
The technology can also be used for translating content so it reaches a wider audience. You can translate voices from any video or podcast to multiple languages, allowing the content to reach a global audience. In addition, Voice Engine can preserve the native accent of the original speaker so any new voice generated would have the same accent.
Voice Engine also offers support for non-verbals, such as individuals who suffer from conditions that affect speech or have special needs for education. By using Voice Engine, the non-verbals can choose to have a realistic and consistent voice that best represents them. It has the power to help patients who have suffered sudden or degenerative speech conditions recover their voice. Even a short sample of the voice, even from an old video, is enough to recreate a complete AI voice.
While OpenAI highlighted several use cases, it also shared some safety concerns. The small-scale deployments are enabling OpenAI to gather feedback about the technology across several industries including government, media, education, and healthcare.
All the trusted partners that were allowed access to Voice Engine agreed to OpenAI’s usage policies, which prohibit them from using the technology to impersonate another individual or organization. In addition, all the partners were required to obtain explicit and informed consent of the original speaker and they must clearly disclose to their audience that the voices were AI-generated. However, the real challenges of this technology will emerge when it is released to the general public.
It’s an encouraging start that OpenAI has admitted to the potential misuse of the technology, and is working on minimizing the risks posed by AI voice generation.
OpenAI plans to implement a set of safety measures, including watermarking to trace the origin of any audio generated by Voice Engine, as well as proactive monitoring of how the technology is being used.
“We believe that any broad deployment of synthetic voice technology should be accompanied by voice authentication experiences that verify that the original speaker is knowingly adding their voice to the service and a no-go voice list that detects and prevents the creation of voices that are too similar to prominent figures.” shared OpenAI in its blog post.
With this being an election year in the U.S., OpenAI acknowledged the political risks of this rapidly evolving technology. Last month, the FTC banned robocalls that used AI voices after people reported receiving spam calls from an AI-cloned voice of President Biden.
The influence of the online ecosystem on democratic discourse is well-documented. Now with AI-powered voice generation tools, it can create more problems. This calls for more research and resources to improve AI detection tools and more widespread education efforts to increase digital literacy in the AI era.
Related Items
Gartner Reveals Top Trends in GenAI Cybersecurity for 2024
OpenAI Rival Inflection AI Raises $1.3B to Enhance Its Pi Chatbot
Nvidia’s Jarvis Offers Real-Time Machine Translation
#AI/ML/DL #Slider:FrontPage #AIclone #AIvoice #deepfake #OpenAI #voicegeneration [Source: EnterpriseAI]