When AI Speaks: OpenAI’s GPT-4o and the Unintended Voice Doppelgängers

Virginia Backaitis
3 min readAug 13, 2024

Last week, OpenAI dropped a report that’s causing quite a stir, and for good reason. The GPT-4o “system card” is the latest from the AI giant, detailing some key risks tied to their newest large language model. Among the most jaw-dropping revelations? The model’s Advanced Voice Mode unexpectedly mimicking users’ voices — without their consent. It’s the kind of scenario that feels like it was ripped straight from the pages of a sci-fi thriller.

Let’s break it down. OpenAI’s documentation states, “Voice generation can also occur in non-adversarial situations, such as our use of that ability to generate voices for ChatGPT’s advanced voice mode.” In testing, there were “rare instances where the model would unintentionally generate an output emulating the user’s voice.” Imagine talking to a machine, only for it to start speaking back to you in your own voice, out of nowhere. Creepy, right? A lot of people thought so, too. Max Woolf, a data scientist at BuzzFeed, quipped on Twitter, “OpenAI just leaked the plot of Black Mirror’s next season.”

But it’s not just about the freaky potential for voice mimicry. The report highlights that GPT-4o isn’t limited to voices — it can also replicate nonverbal sounds, like music and sound effects. OpenAI acknowledges that this capability could “facilitate harms such as…

--

--