Auto-generated AI podcasts have arrived
Plus: Shell Game on the Pivot podcast, a dream review Down Under, and a shout in The New Yorker
This week, AI voice advancements begin to overtake scenes from Season One. But first, some show news: Shell Game and AI Evan keep appearing in new places together, popping up to talk about living that Florida Life (TM). In The New Yorker, Jill Lepore tackled the question of artificial voices joining our world, name-checking the show along the way (“riveting”—one for the movie poster). In The Saturday Paper in Australia, we got the kind of dream review (semi-paywalled) that’s rare in the podcast landscape. The writer, Louisa Lim, approached the show curiously and critically at length—and even… interactively, calling up the AI to talk to it. She found the show “jaw-droppingly good,” noting that “there’s a strange gap between knowing something is possible and hearing it for yourself.” AI Evan and I also made another appearance Down Under, on Australian public radio’s Sunday Extra (they, too, called the AI Evan line, and had it record radio promos for the episode). I had some great extended conversations about all things Shell Game on a pair of AI-themed podcasts, Untangled with Charley Johnson and The Gradient with Daniel Bashir, attached to a couple of the best AI newsletters out there.
Last week, I stopped in—for once without my AI doppelegänger—to talk with Kara Swisher and Scott Galloway, on the Pivot podcast. We chatted about, among other things, how AI voice agents are more advanced than many people realize. How they’re going to start showing up—including as podcast hosts—perhaps sooner than we’re ready for.
Well, just days later, here they are. Over the last week, the popularity of a Google AI project called NotebookLM has started to take off, fueled primarily by its ability to produce AI-host-on-AI-host podcasts about any document you give it. Google describes NotebookLM as a way “to help you make sense of complex information,” pitched initially towards researchers and journalist types. (Journalist and author Steven Johnson is the editorial director of the project.) The podcast generation feature is staggeringly simple: You upload a document to one of your “notebooks,” click a button, and a few seconds or minutes later NotebookLM produces what it calls a “deep dive”: a podcast episode featuring two AI hosts summarizing and discussing the contents of the document.
A Shell Game listener, Arlo Devlin-Brown, had actually put me onto NotebookLM’s podcasting ability a few weeks back. (Thanks Arlo!) He’d generated a “deep dive” by feeding it the transcript of Federal Reserve Chairman Jerome Powell’s recent press conference about cutting interest rates. This feels like precisely the kind of digest-a-dense-document use case the NotebookLM creators had in mind. Here’s his result, which takes the form of one AI sort of… quizzing the other one about the press conference:
Except for the occasional glitchy artifact, NotebookLM’s AI voices—the same man-woman pair is used in all “deep dive” podcasts—are quite good, in terms of human simulation. They’ve obviously been shaped to insert the necessary human-like pauses and “likes,” and add particular turns of phrase to move the conversation naturally along. Things like: “And this is where it gets interesting…,” “with that in mind…,” and “now let’s switch gears for a sec….” (They also fall back on some of those large language model favorites, always “diving in” and “diving deep.” Which makes sense—it’s right there in the name.) Because the output is AI generated audio on both ends, and not an actual live conversation with a human, there are also zero latency issues. Indeed, the opposite is true: the AI’s interrupt each other smoothly and flawlessly, like a cheesy pair of human hosts would (“Wait…really?” “Oh yeah, totally”).
If you’ve listened to Shell Game, you know this isn’t a particularly giant leap. It’s essentially a gussied-up version of the two AI’s talking to each other from Episode 3, except with more highly crafted voices and working off of a “knowledge base” that the user uploads. You could make this yourself in ElevenLabs (the cloning software I used on the show) or Vapi (the AI calling platform), each one would just be fairly time consuming and require some audio editing.
But the combination of extremely high voice quality and a two-click interface is pushing voice AI into a viral realm that it hadn’t yet entered. As I noted on Twitter, people have started going nuts over the caliber of these podcasts. Some have been “jailbreaking” the system, trying to trick the AI into trying (or talking about trying) physical tasks, summarizing ridiculous documents, or acknowledging that it’s an AI. And who can blame these mischievous humans? I can attest to how fun voice agents are to mess around with.
As always with a big AI launch, largely missing amidst the gee-whiz boosterism and good times is much reflection on what any of this means, and what happens when these “shows” appear en masse, in the wild, with any voice you want. Google currently doesn’t allow custom voices, but that’s a product development issue, not something they seem to view as an ethical barrier. Build features, test quickly, and watch what the users do with them is a fairly standard tech release ethos, especially when the product starts to get some escape velocity. Which this definitely has, according to the Twitter feed of the NotebookLM product lead, who sounded both gratified and taken aback at the ways people had found to mess with the tool. “I need to think about this more,” she wrote on Monday, after seeing the range of uses people had already found for it.
There are indeed a variety of questions to think about here, whether Google engages with them before releasing their products or after they’re out of the bag. There’s what’s often called the hallucination problem—the AI getting stuff wrong—but what I like to call, in the realm of conversational AI, the bullshitting problem. Google dutifully notes that “deep dive” AI hosts “sometimes introduce inaccuracies”—a fun disclaimer to imagine being appended to a human-made podcast. And in social media posts about the product, Steven Johnson has talked about how at least the text version of NotebookLM can be used to cite sources and check facts. It’s not remotely clear how the audio version would do this, or how anyone could or would check the facts in the podcast itself in real time, while listening.
Even setting aside the AI’s penchant for riffing, which I’ve documented elsewhere, my mind always goes to the darker places. After all, humans are the ones uploading the documents, and humans are excellent at manipulating this kind of technology to dubious ends. There’s the low hanging fruit of “let’s see if it’ll make a podcast about Mein Kampf,” which one would hope the Google team has dealt with in advance. (I’m not going to test it, but no doubt someone will, or has already.) But there are more subtle manipulations too. How simple would it be to take a factual document and insert just one or two falsehoods into it, producing a persuasive and authentic-sounding podcast that’s cleverly deceptive?
Turns out, about as easy as the five minutes it took me to create this rapturous and almost-entirely-true “deep dive” about (in the spirit of the show) myself and Shell Game:
This naturally also raises the will it take our jobs question, although in this case I doubt that podcast hosts will generate the kind of widespread sympathy that other AI-threatened industries warrant. The Amazon-owned podcast platform Wondery already tried launching an AI sports host last December (“Striker,” whom you can hear in Shell Game’s Episode 5), before killing off the whole show it was deployed to co-host. Now less than a year later, NotebookLM’s fake hosts are vastly higher quality than the likes of Striker, and anyone can use them.
But to me, all these issues are dwarfed by the larger question of what exactly happens to us when artificial voices are in constantly in our ears. The sheer volume of this synthetic audio, generated in seconds by millions of people, will create its own kind of voice agent deluge—an audio version of the AI slop Max Read recently wrote about in New York magazine. What will it feel like when you are increasingly uncertain if you are listening to, or conversing with, a human or non-human? What does it change about who we trust, and why? These are the questions we tried to get at in Shell Game.
It’s clear at least that soon, we won’t be able to easily distinguish the sound of AI voices from human ones. The questions beyond that are whether we’ll be able to distinguish the stories they’re telling, the arguments they’re making, or the ideas they’re producing—and whether we care to.
On that front, our own human production of ever greater amounts of content mill garbage—first on the internet, and then in audio form—has at least partly ceded the field. One venture capitalist, after uploading “200 pages of raw court documents,” proclaimed that NotebookLM produced “a true crime podcast that is better than 90% of what’s out there.” Which sounds absurd, at first. But given that most true crime podcasts are just a couple people sitting around ripping off old journalism and Wikipedia entries, it might not be far off. As I’ve said in several interviews after Shell Game came out, part of the problem with confronting the coming avalanche of synthetic voices is that by setting the modern bar for human storytelling so low, we’ve made it trivial for AI to step over it.
Always clowning,
Evan
I don't think I'll ever warm to the idea of AI-generated podcasts. My favorite podcast is my favorite precisely because it is so undeniably human.
Evan, your podcast + having NotebookLM make a podcast out of my last post gave me an entirely new view on what might come with AI. Shell Game is great; I caught you on Pivot and they undersold it!