I would highly recommend gemini 2.5 pro too for their speech quality. It's priced lower and the quality is top notch on their API. I made an implementation here in case you're interested https://www.github.com/akdeb/ElatoAI but its on hardware so maybe not totally relevant
I'm using LiveKit, and I indeed have tested Gemini, but it appears to be broken or at least incompatible with OpenAI. Not sure if this is a Livekit issue or a Gemini issue. Anyway I decided to go back to just using LLM, SST and TTS as separate nodes, but I've also been looking into Deepgram Voice Agent API, but LiveKit doesn't support it (yet?).
Its as if the rubber duck was actually on the desk while youre programming and if we have an MCP that can get live access to code it could give you realtime advice.
Wow, that's really cool thanks for open sourcing! I might dig into your MCP I've been meaning to learn how to do that.
I genuinely think this could be great for toys that kids grow up with i.e. the toy could adjust the way it talks depending on the kids age and remember key moments in their life - could be pretty magical for a kid
Thank you! It's been super fun to work on. The challenges were more on the ESP32 side. Like getting audio to work smoothly with Opus and the audio timing challenges. This is one of the reasons I open-sourced.
It seems pointless to think that everyone should cross that C++/Audio barrier to make something cool. Using this cuts a lot of dev time and brings products out to market wayy quicker. The repo basically helps launch your AI toy brand
reply