yes, I think you are right. When I did the math on 11labs million chars I got the same numbers (Pro plan).
I'm super happy about this, since I took a bet that exactly this would happen. I've just been building a consumer TTS app that could only work with significant cheaper TTS prices per million character (or self-hosted models)
Oh man, they have the "Sky" voice, and it seems to be the same one that OpenAI had but then removed? Not sure how that's possible, but I'm very happy about it.
Unless there is some leak from OpenAI, I'm not sure we'll ever have it confirmed yes or no. But my brain thought it was Johansen from the first few seconds I heard the voice and I don't seem to be alone with that reaction. The fact that they removed the voice also speaks to it to have been trained on her voice.
Listening to it again today with fresher ears (the original OpenAI Sky, not the clones elsewhere), I still hear Johansen as the underlying voice actor for it, but maybe there is some subconscious bias I'm unable to bypass.
Hmm, I never thought it was her, her voice is much more raspy, whereas Sky is a bit lighter. I can hear the similarity, I just don't think they sound exactly alike.
As you say, I'm not sure we'll ever know, although the Sky voice from Kokoro is spot on the Sky voice from OpenAI, so maybe someone from Kokoro knows how they got it.
For anyone else reading this, librera reader + sherpaTTS are both FOSS android apps and can read anything librera can open on an ad-hoc basis, with no need to futz with files, just load your ebook bookmark and hit play.
SherpaTTS has a bunch of different models (piper/coqui) with a ton of voices/languages. There's a slight but tolerable delay with piper high models but low is realtime.
Any plans to make a Chrome extension variant? Been looking for a high quality and cheap TTS extension for ages (like ElevenLabs Human Reader, except with less absurd pricing)
I din't think of that, interesting idea. What I'm focusing right now is long-form content for more offline-ish listening, but maybe a plugin could work to load longer texts, but I'm not working on a screen reader atm.
Do you know if there's any offerings today that can read math? Like speak an equation the way a human would? It's something I've been thinking about a long time and would be an essential feature for me (the only things i read are physics)
I saw a small model trained on outputting currency aware text from decimals/integers
i wonder if you could make a similar -narrow- lora finetune to train a model to output human readable text from say latext formulas with a good data set to train on
Primarily for reading articles aloud online. I've been trying the latest Siri TTS which is a big improvement (and free), but it's still nowhere near accurate enough for proper nouns or newer terms, which ElevenLabs handles much better.
I'm super happy about this, since I took a bet that exactly this would happen. I've just been building a consumer TTS app that could only work with significant cheaper TTS prices per million character (or self-hosted models)