Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

yes, I think you are right. When I did the math on 11labs million chars I got the same numbers (Pro plan).

I'm super happy about this, since I took a bet that exactly this would happen. I've just been building a consumer TTS app that could only work with significant cheaper TTS prices per million character (or self-hosted models)



Kokoro TTS is pretty good for open source. Worth checking out.


Yes, kokoro is great, and the language flexibility is a huge plus too. And the best prices per character is for sure if you self-host.


Oh man, they have the "Sky" voice, and it seems to be the same one that OpenAI had but then removed? Not sure how that's possible, but I'm very happy about it.


> Not sure how that's possible

Download bunch of movies Scarlet Johansen been in, segment into audio clips where she talks and train the model :)


Is it actually her? I didn't think it was, but maybe.


Unless there is some leak from OpenAI, I'm not sure we'll ever have it confirmed yes or no. But my brain thought it was Johansen from the first few seconds I heard the voice and I don't seem to be alone with that reaction. The fact that they removed the voice also speaks to it to have been trained on her voice.

Listening to it again today with fresher ears (the original OpenAI Sky, not the clones elsewhere), I still hear Johansen as the underlying voice actor for it, but maybe there is some subconscious bias I'm unable to bypass.


Hmm, I never thought it was her, her voice is much more raspy, whereas Sky is a bit lighter. I can hear the similarity, I just don't think they sound exactly alike.

As you say, I'm not sure we'll ever know, although the Sky voice from Kokoro is spot on the Sky voice from OpenAI, so maybe someone from Kokoro knows how they got it.


What does it do?


Convert any file (pdf, epub, txt) to an audoibook, downloadable as mp3, or directly listenable via RSS feed in, say, Apple Potcasts app.

Basically make one-off audiobooks for yourself or a few friends.


For anyone else reading this, librera reader + sherpaTTS are both FOSS android apps and can read anything librera can open on an ad-hoc basis, with no need to futz with files, just load your ebook bookmark and hit play.

SherpaTTS has a bunch of different models (piper/coqui) with a ton of voices/languages. There's a slight but tolerable delay with piper high models but low is realtime.


Any plans to make a Chrome extension variant? Been looking for a high quality and cheap TTS extension for ages (like ElevenLabs Human Reader, except with less absurd pricing)


I din't think of that, interesting idea. What I'm focusing right now is long-form content for more offline-ish listening, but maybe a plugin could work to load longer texts, but I'm not working on a screen reader atm.


Do you know if there's any offerings today that can read math? Like speak an equation the way a human would? It's something I've been thinking about a long time and would be an essential feature for me (the only things i read are physics)


I saw a small model trained on outputting currency aware text from decimals/integers

i wonder if you could make a similar -narrow- lora finetune to train a model to output human readable text from say latext formulas with a good data set to train on


What is your use-case here?


Primarily for reading articles aloud online. I've been trying the latest Siri TTS which is a big improvement (and free), but it's still nowhere near accurate enough for proper nouns or newer terms, which ElevenLabs handles much better.


Same for me :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: