What is the point of them trying to create this? That something like this would ...

rhdunn · on July 10, 2024

There are legitimate uses of this tech, such as preserving voices of people losing them such as Stephen Hawking, or making it better for blind/low vision people to follow text and interact with devices. For that latter case having a more natural voice that is also accurate is a good thing.

I use TTS to listen to articles and stories that don't have access to an audiobook narrator. I've used some of the voices based on MBROLA tech, but those can grate after a while.

The more recent voice models are a lot higher quality and emotive (without the jarring pitch transitions of things like Cepstral) so are better to listen to. However, the recent models can clip/skip text, have prolonged silence, have long/warped/garbled words, etc. that make them harder to use longer term.

Paul-Craft · on July 10, 2024

You're right, of course. Unfortunately, however, we're all just actors in a giant, multi-player, iterated Prisoner's Dilemma here. If I decide not to pursue human-level automated speech generation, or I end up developing it and don't release because it's "too dangerous," someone else will just come in behind me and take all that market share I could have captured.

It's like we're stuck in some movie that came out in 1994[0], or something. Except, in this version, everything is gonna up sooner or later, anyway. Might as well profit from it along the way, right?

Le sigh.

---

[0]: https://www.imdb.com/title/tt0111257/

ninjanomnom · on July 10, 2024

At least one good use is for video games where the text of some dialogue is determined when you run the game. For example in a game I work on player chat is local and voiced by tts configured by the player for their character.

HeatrayEnjoyer · on July 10, 2024

Move fast and break things (including organized society).

I can't even think of non malicious uses that are anything more than novelty or small conveniences. Meanwhile the malicious use cases are innumerable.

In a just world building this would be a severe felony, punished with prison and destruction of all of the direct and indirect source material.

mensetmanusman · on July 10, 2024

Cancer that takes someone’s voice.

pragma_x · on July 10, 2024

Agreed.

On the one hand, I would love this kind of tech to be available for entertainment purposes. An RPG with convincing NPCs that are able to provide a novel experience for every player? Sounds great.

On the other: this is fraught with ethical problems, not to mention an ideal tool for fraud. At worst, it could be used as a weapon for total asymmetrical warfare on concepts like media integrity and an ideal tool for character assassination; disinformation, propaganda, etc.

I would happy welcome a world where this stuff is nerfed across the board, where videogames and porn are just chock full of AI voice-acting artifacts. We'll adjust and accept that as just a part of the experience, as we have with low fidelity media of the past. But my more cynical side tells me that's not what people in power are concerned about.

ryandrake · on July 10, 2024

This is what happens when you have an industry full of people "looking for challenging problems to solve" without an ethical foundation to warn them that just because you can build something doesn't mean you should.

unraveller · on July 10, 2024

The point is to spawn a new medium, you'll have to imagine harder how positive that could be as people with lots of ideas are not going to give them to you for free.

Perfecting the tech for wide-spread use has trade offs; need for caller id, ease of slandering until trust in voice uniqueness recalibrates, all of which is going to change soon anyway but giving only rich/bad actors the tech at first has its own set of trade offs. Head in the sand is the irresponsible way.