Spotify AI-Powered Translation Tool Has Podcasters Speaking in Global Tongues

podcast studio microphone on the air

Spotify is opening up foreign language markets to its podcasters through artificial intelligence.

The company on Monday announced a pilot program called Voice Translation for podcasts that not only translates a podcast from one language to another but will retain the podcaster’s voice as it does it.

Spotify’s new translation tool, which uses OpenAI’s voice generation technology, can clone a speaker’s voice characteristics to make a translation sound more natural.

The pilot program will feature select podcasts from Dax Shepard, Monica Padman, Lex Fridman, Bill Simmons, and Steven Bartlett, translated into Spanish, French, and German.

In the future, Spotify also plans to translate episodes of Dax Shepard’s “eff won with DRS,” “The Rewatchables” from The Ringer, and Trevor Noah’s new original podcast to be launched later this year.

“By matching the creator’s own voice, Voice Translation gives listeners around the world the power to discover and be inspired by new podcasters in a more authentic way than ever before,” Spotify Vice President of Personalization Ziad Sultan said in a statement.

“We believe that a thoughtful approach to AI can help build deeper connections between listeners and creators, a key component of Spotify’s mission to unlock the potential of human creativity,” he added.

Benefits for Podcasters and Spotify

The new translation tool has the potential to be beneficial to both podcasters and Spotify. “The Spotify proposal could extend the audience reach of these podcasts to new audiences and countries,” said Greg Sterling, co-founder of Near Media, a news, commentary, and analysis website.

“This potentially benefits both Spotify and the podcaster by expanding audience reach,” he told Arise Point.

English podcasts translated into Mandarin and Hindi would have access to some very large markets they wouldn’t have access to if the podcaster didn’t speak those languages, added Rowan Curran, analyst with Forrester Research, a national market research company headquartered in Cambridge, Mass.

“This represents a democratization of language AI capabilities,” he told Arise Point. “That’s following the pattern of the last couple of years of these really advanced functionalities becoming available to a very broad set of folks.”

Rob Enderle, president and principal analyst at the Enderle Group, an advisory services firm in Bend, Ore., pointed out that podcasters won’t only be adding to their audience but their wallets, too, as the more ears their podcasts capture, the greater the potential revenues they can generate.

The same is true for Spotify. “Each performer can generate more income; high performers will make the company much more money,” he told Arise Point.

Pressure To Make Investments Pay Off

Ashu Dubey, co-founder and CEO of Gleen, a generative AI company in Pleasanton, Calif., agreed that the translation tool could have a positive impact on Spotify’s bottom line.

“If there is a high-demand podcast that is only recorded in English, then this technology could expose that program to audiences in Japan or France, for example, and help Spotify sell more subscriptions in those countries,” he told Arise Point.

Spotify really needs to sell more subscriptions, maintained Todd Cochrane, CEO of Blubrry Podcasting, a podcast hosting and distribution service in Traverse City, Mich.

“They need bigger numbers of listeners to monetize against, as they are under extreme pressure to make their billion-dollar investments recover the money they have lost,” he told Arise Point.

Spotify has made some high-profile deals in recent years, including a US$200 million multi-year exclusive pact with podcaster Joe Rogan, $196 million for the Ringer sports and pop culture site, and $56 million for the Parcast production company, known for its true crime podcasts.

While Spotify is out front with its translation tool now, its lead could fizzle fast. “This is not just going to be Spotify’s technology,” Curran cautioned, “Spotify is the first, big creator platform to do this, but it’s going to be a short time until we see this on platforms like YouTube.”

Potentially Dangerous Technology

Despite the benefits of Spotify’s new translation tool, its underlying technology has a dark side, too.

“The technology can be quite dangerous and potentially exploitative,” Sterling said. “It’s already being used in frauds and scams. And there are unauthorized uses of celebrity voice clones already happening in audiobook recordings.”

“It needs to be used with caution and in every case with the subject’s permission,” he continued. “But the power imbalance between platforms and individuals on them may not generate equitable use cases of voice AI. There need to be clear, ethical guidelines in place.”

“This is one of the issues in the still-unsettled actor’s strike. Do the studios have a right to exploit an actor’s voice and image in perpetuity without permission?” he added.

Dubey pointed out that the translation tool could be subject to that bane of AI applications: hallucinations.

“This could happen if the podcaster were to use a phrase that didn’t really have an equivalent phrase in the language being translated,” he explained.

“For example,” he continued, “the German term ‘schadenfreude’ doesn’t really have a strict translation in most languages, so an AI that is relying solely on a large language model could end up hallucinating the translation and putting words in the podcasters mouth.”

Execution Key to Success

Translations could create legal problems for podcasters, too.

“If the AI technology fails to provide an accurate translation of a podcast creator’s content, the podcast creator could face legal consequences, such as defamation or FTC violations,” noted Alyssa J Devine, CEO and founder of Purple Fox Legal, a law firm with a focus on intellectual property law for entrepreneurs and creatives, in Nashville, Tenn.

“The appropriate jurisdiction and venue for such claims would depend on the facts of a specific situation, but it is not unheard of for a plaintiff in one country to obtain a judgment against a defendant in another county,” she told Arise Point.

Execution will be a key to success for Voice Translation, Cochrane maintained.

“If Spotify does not execute this well, it could do the opposite and hurt all podcast content across the platform and turn those non-English native listeners off to the content,” he said. “It’s a real risk if it sounds synthetic and without inflection.”

Mark N. Vena, president and principal analyst of SmartTech Research in San Jose, Calif., and also a podcaster, explained that translating podcasts can be challenging.

“When you translate things into different languages, everything said in one language can’t be cleanly translated into another,” he said.

“If the accuracy of the translation isn’t very good, that’s going to be a problem,” he continued. “There’s also going to be a problem with cleaning up some of the artifacts of a podcast — the ‘ums’ and ‘ahs’ and awkward gaps.”

“I’m very skeptical of how effective this will be,” he asserted.