OpenAI Can Re-Create Human Voices—but Won’t Release the Tech Yet

Voice synthesis has come a long way since 1978’s Speak & Spell toy, which once wowed people with its state-of-the-art ability to read words aloud using an electronic voice. Now, using deep-learning AI models, software can create not only realistic-sounding voices but can also convincingly imitate existing voices using small samples of audio.

Along those lines, OpenAI this week announced Voice Engine, a text-to-speech AI model for creating synthetic voices based on a 15-second segment of recorded audio. It has provided audio samples of the Voice Engine in action on its website.

Once a voice is cloned, a user can input text into the Voice Engine and get an AI-generated voice result. But OpenAI is not ready to widely release its technology. The company initially planned to launch a pilot program for developers to sign up for the Voice Engine API earlier this month. But after more consideration about ethical implications, the company decided to scale back its ambitions for now.

“In line with our approach to AI safety and our voluntary commitments, we are choosing to preview but not widely release this technology at this time,” the company writes. “We hope this preview of Voice Engine both underscores its potential and also motivates the need to bolster societal resilience against the challenges brought by ever more convincing generative models.”

Voice cloning tech in general is not particularly new—there have been several AI voice synthesis models since 2022, and the tech is active in the open source community with packages like OpenVoice and XTTSv2. But the idea that OpenAI is inching toward letting anyone use its particular brand of voice tech is notable. And in some ways, the company’s reticence to release it fully might be the bigger story.

OpenAI says that benefits of its voice technology include providing reading assistance through natural-sounding voices, enabling global reach for creators by translating content while preserving native accents, supporting non-verbal individuals with personalized speech options, and assisting patients in recovering their own voice after speech-impairing conditions.

But it also means that anyone with 15 seconds of someone’s recorded voice could effectively clone it, and that has obvious implications for potential misuse. Even if OpenAI never widely releases its Voice Engine, the ability to clone voices has already caused trouble in society through phone scams where someone imitates a loved one’s voice and election campaign robocalls featuring cloned voices from politicians like Joe Biden.

Also, researchers and reporters have shown that voice-cloning technology can be used to break into bank accounts that use voice authentication (such as Chase’s Voice ID), which prompted US senator Sherrod Brown of Ohio, the chair of the US Senate Committee on Banking, Housing, and Urban Affairs, to send a letter to the CEOs of several major banks in May 2023 to inquire about the security measures banks are taking to counteract AI-powered risks.

OpenAI recognizes that the tech might cause trouble if broadly released, so it’s initially trying to work around those issues with a set of rules. It has been testing the technology with a set of select partner companies since last year. For example, video synthesis company HeyGen has been using the model to translate a speaker’s voice into other languages while keeping the same vocal sound.

To use Voice Engine, each partner must agree to terms of use that prohibit “the impersonation of another individual or organization without consent or legal right.” The terms also require that partners acquire informed consent from the people whose voices are being cloned, and they must also clearly disclose that the voices they produce are AI-generated. OpenAI is also baking a watermark into every voice sample that will assist in tracing the origin of any voice generated by its Voice Engine model.

So, as it stands now, OpenAI is showing off its technology, but the company is not yet ready to put itself on the line (yet) for the potential social chaos a broad release might cause. Instead, the company has re-calibrated its marketing approach to appear as if it is warning all of us about this already-existing technology in a responsible way.

“We are taking a cautious and informed approach to a broader release due to the potential for synthetic voice misuse,” the company said in a statement. “We hope to start a dialogue on the responsible deployment of synthetic voices and how society can adapt to these new capabilities. Based on these conversations and the results of these small scale tests, we will make a more informed decision about whether and how to deploy this technology at scale.”

In line with its mission to cautiously roll out the tech, OpenAI has provided three recommendations for how society should change to accommodate its technology in its blog post. These steps include phasing out voice-based authentication for bank accounts, educating the public in understanding “the possibility of deceptive AI content,” and accelerating the development of techniques that can track the origin of audio content, “so it’s always clear when you’re interacting with a real person or with an AI.”

OpenAI also says that future voice-cloning tech should require verifying that the original speaker is “knowingly adding their voice to the service” and creating a list of voices that are forbidden to clone, such as those that are “too similar to prominent figures.” That kind of screening tech may end up excluding anyone whose voice might naturally and accidentally sound too close to a celebrity or US president.

Tech Developed in 2022

According to the company, OpenAI developed its Voice Engine technology in late 2022, and many people have already been using a version of the technology with pre-defined (and not cloned) voices in two ways: The spoken conversation mode in the ChatGPT app released in September and OpenAI’s text-to-speech API that debuted in November of last year.

With all the voice-cloning competition out there, OpenAI says that Voice Engine is notable for being a “small” AI model (how small, exactly, we do not know). But having been developed in 2022, it almost feels late to the party. And it may not be perfect in its cloning ability. Previous user-trained text-to-voice models like those from ElevenLabs and Microsoft have struggled with accents that fall outside their training dataset.

For now, Voice Engine remains a limited release to select partners.

This story originally appeared on Ars Technica.

Note: This article have been indexed to our site. We do not claim legitimacy, ownership or copyright of any of the content above. To see the article at original source Click Here

Related Posts
Drei Monate nach letzter Runde: Wieder Geld für Medikamenten-Lieferdienst Mayd thumbnail

Drei Monate nach letzter Runde: Wieder Geld für Medikamenten-Lieferdienst Mayd

Home Gründerszene Health Kampf um die Kranken Drei Monate nach letzter Runde: Wieder Geld für Medikamenten-Lieferdienst Mayd Die Finanzierungsrunden sind bei dem Berliner Apotheken-Service eng getaktet. Nun ist mit Lightspeed Ventures auch ein bekannter US-amerikanischer Investor eingestiegen. Hanno Heintzenberg und Lukas Pieczonka (rechts) haben schon zusammen das Immobilienportal McMakler aufgebaut. Jetzt liefern sie Medikamente.Mayd Der…
Read More
Kenya fines two digital lenders $20,000 for abusing user data thumbnail

Kenya fines two digital lenders $20,000 for abusing user data

Some digital lenders have resumed harassing borrowers on their platform, even in cases where laws protect them against personal data abuse. The Office of the Data Protection Commissioner (ODPC) has stepped in. The Office of the Data Protection Commissioner (ODPC) has fined three entities a total of KES 9.3 million ($63,500) in a move set
Read More
下月發佈的 MacBook Pro,除了換上 M2 之外什麼都不會改 thumbnail

下月發佈的 MacBook Pro,除了換上 M2 之外什麼都不會改

有外媒給出消息稱,他們與 Apple 供應鏈有密切聯繫的可靠消息來源稱, Apple 將於下月在更新的13吋 MacBook Pro 機型中首次推出其最新的 M2 Apple Silicon處理器,稍遺憾的是該機型的機殼將沒有重大的設計變化。並且即將推出的13吋 MacBook Pro 也將保留了與當前版本相同的設計,包括 Touch Bar,但與 14吋和16吋 MacBook Pro 機型不同,它不會有劉海或 ProMotion 屏幕,與一些傳言相反。如果是這樣,這意味著新的入門級MacBook Pro機型的最大差異將僅僅是新的 M2 處理器,M2 處理器的 CPU 核心數量將會和 M1 處理器相同,最多 10個 GPU 核心,性能也有所提高。以上資料的洩密者就是由去年最初透露 MacBook Pro 包含劉海的來源,所以可信性大大增加。有趣的是不久前一份 DigiTimes 的報告中就指出 Apple 將在春季發布會上推出其 M2 處理器的 MacBook Pro,而現在最新的消息也印證了此次 Apple 將要發布的這款 13吋 MacBook Pro 完全一致。此外 DigiTimes 的供應鏈消息來源還透露,除處理器外,新MacBook Pro基本沒什麼變化,所使用的大多數其他部件的規格與採用M1處理器的現有機型幾乎相同,而這一說法也逐漸證實了最新的傳言。彭博社記者 Mark Gurman…
Read More
Mapletree Spends $3B to Purchase 141 US Warehouses for Private Fund thumbnail

Mapletree Spends $3B to Purchase 141 US Warehouses for Private Fund

3955 East Holmes Road in Memphis is among the new additions to Mapletree’s US portfolio Mapletree Investments on Thursday announced that it has paid about $3 billion to add to its US logistics holdings, marking the Singaporean real estate fund manager’s third major North American market milestone since May. The Temasek Holdings-owned firm has purchased…
Read More
Dormi col telefono in carica vicino al letto? Rischio obesità e diabete thumbnail

Dormi col telefono in carica vicino al letto? Rischio obesità e diabete

Secondo uno studio, dormire col telefono vicino è piuttosto pericoloso, soprattutto se in carica. Tra le patologie che si rischiano ci sono l'obesità e il diabete. Strano no? Ecco la spiegazione. di Lorenzo Tirotta pubblicata il 07 Febbraio 2022, alle 18:51 nel canale Telefonia Nonostante sia assodato ormai da tempo, dormire con lo smartphone vicino…
Read More
Index Of News
Total
0
Share