Spotify develops AI-powered voice cloning tool that can translate podcasts into multiple languages

Credit: Kaspars Grinvalds/Shutterstock

Spotify is piloting a new AI-powered Voice Translation tool for its podcasts.

Described by the company in a press release on Monday (September 25), as “groundbreaking”, the new AI tool can translate podcasts into additional languages — all in the podcaster’s own voice.

As part of the pilot, Spotify has worked with podcasters Dax Shepard, Monica Padman, Lex Fridman, Bill Simmons, and Steven Bartlett to generate AI-powered voice translations in other languages, including Spanish, French, and German,  for a select number of catalog episodes and future episode releases.

Spotify is also looking to include other shows, such as Dax Shepard’s eff won with DRS, The Rewatchables and Trevor Noah’s new original podcast, which launches later this year.

According to Spotify, by using this tech, it can “match the original speaker’s style, making for a more authentic listening experience that sounds more personal and natural than traditional dubbing”.

See the new tool in action in the video below.



The company says that its newly developed tool leverages “the latest innovations” in AI. One of those innovations is new voice generation technology from OpenAI, the creator of the AI chatbot ChatGPT.

Coinciding with the launch of Spotify’s new tool, Open AI revealed on its own website that it is rolling out “new voice and image capabilities in ChatGPT”.

Within that announcement, OpenAI claims that “ChatGPT can now see, hear, and speak” and explains that its “new voice capability is powered by a new text-to-speech model, capable of generating human-like audio from just text and a few seconds of sample speech”.

OpenAI says that it collaborated with professional voice actors to create each of the voices and that it also uses Whisper, its open-source speech recognition system, to transcribe your spoken words into text.


Spotify’s new pilot voice-translated episodes will be available worldwide to Premium and Free users and Spotify says that it’ll start by releasing an initial bundle of translated episodes in Spanish, with French and German rolling out in the coming days and weeks.

“We believe that a thoughtful approach to AI can help build deeper connections between listeners and creators, a key component of Spotify’s mission to unlock the potential of human creativity.”

Ziad Sultan, Spotify 

“By matching the creator’s own voice, Voice Translation gives listeners around the world the power to discover and be inspired by new podcasters in a more authentic way than ever before,” says Ziad Sultan, VP of Personalization.

“We believe that a thoughtful approach to AI can help build deeper connections between listeners and creators, a key component of Spotify’s mission to unlock the potential of human creativity.”

The platform also says that its announcement on Monday “is just the beginning” and that “it’s all part of Spotify’s commitment to continue empowering creators to bring their storytelling to more listeners across the world”.

Spotify’s new AI tool marks the company’s latest move in the world of artificial intelligence-powered audio.

In May, the company launched an AI ‘DJ’ feature that delivers a curated playlist of music alongside commentary around the tracks and artists it thinks you will like.

This commentary is narrated “in a stunningly realistic voice” using a so-called “dynamic AI voice platform” from its recent acquisition of London-based AI voice startup Sonantic.

Launched in December 2018 by Zeena Qureshi and John Flynn, Sonantic’s founders have backgrounds in speech and language therapy to Hollywood sound production.

Last year, Sonantic built a custom AI voice model for actor Val Kilmer, which was used in the most recent Top Gun film, Top Gun: Maverick.


As Spotify tests out its new AI translation tool for podcasts, a question many in the music business will be asking today is, when will we see similar technology piloted on this scale for artists?

In other words, when will a superstar artist like Taylor Swift be able to release a new track in multiple languages, in the artist’s own voice, on the day of release?

We caught a glimpse of how AI vocal technology can be used to do just that earlier this year when South Korea-based entertainment giant HYBE released a new single called Masquerade from an artist called MIDNATT.

HYBE claimed that the track was the “first-ever multilingual track produced in Korean, English, Japanese, Chinese, Spanish and Vietnamese”.

The multilingual track employs the technology of Supertone, the fake voice AI company it acquired last year in a deal worth around $32 million, following an initial investment in the startup in February 2021.Music Business Worldwide

Related Posts