More

vkkhare · 2025-06-06T06:45:35 1749192335

updated the link

vkkhare · 2025-05-13T08:52:20 1747126340

Isn't the inference cost of running these models at scale challenging? Currently it feels like small LLMs (1B-4B) are able to perform well for simpler agentic workfows. There are definitely some constraints but surely much easier than to pay for big clusters on cloud running for these tasks. I believe it distributes the cost more uniformly

bigyabai · 2025-05-13T15:43:28 1747151008

It is very likely that you consume less power running a 1B LLM on an Nvidia supercluster than you do trying to download and run the same model on a smartphone. I don't think people understand just how fast the server hardware is compared to what is in their pocket.

We'll see companies push for tiny on-device models as a novelty, but even the best of those aren't very good. I firmly believe that GPUs are going to stay relevant even as models scale down, since they're still the fastest and most power-efficient solution.

vkkhare · 2025-05-12T16:52:43 1747068763

We will be open sourcing the entire platform soon. This blog shows our platform to build the AI app

https://www.nimbleedge.com/blog/how-to-run-kokoro-tts-model-...

vkkhare · 2025-05-12T16:39:12 1747067952

We have just started open sourcing the on-device AI platform.

We have started with the github repo for custom Kokoro TTS model. It is basically a batch implementation for Kokoro while supporting streaming.

https://github.com/nimbleEdge/kokoro

We will soon share the discord community too.

vkkhare · on Dec 27, 2021

maybe try local p2p networking and compute offloading?

Check out https://nimbleedge.ai for a cool demo

This is our repository https://github.com/NimbleEdge/RecoEdge

vkkhare · on Dec 26, 2021

I would love to talk more about it and understand your take. Is there a way I can reach out to you? I have been reading the content in web 3 and form my own opinion on the merits/demerits.

leshokunin · on Dec 26, 2021

Sure, join our Discord at https://discord.gg/dnsxyz or ping me on Twitter (@DNS)

vkkhare · on Dec 25, 2021

What do people think of automatic differentiation support facebook was trying for Kotlin.

They called it differentiable programming https://ai.facebook.com/blog/paving-the-way-for-software-20-...

vkkhare · on Dec 25, 2021

I believe edge computing is one of the game changing technologies for the decade. I am not sure if it can fall under the purview of Web 3 or not. For example one of the library I implemented was along training ML models on user devices instead of the cloud maintaining privacy and personalization.

https://github.com/NimbleEdge/RecoEdge

vkkhare · on Dec 25, 2021

That's what I am worried about, from its description the intent looks good but being aligned to only one industry or vertical defeats the purpose of ubiquity. People wouldn't trust it if they all just see is a volatile currency

vkkhare · on Dec 25, 2021

Marc is reading this ;p