I feel like these guys are missing a pretty important point in their own analysi...

johntash · 2025-02-14T23:23:59 1739575439

What kind of issues did you have with streaming? I also set up ollama on fly.io, and had no issues getting streaming to work.

For the LLM itself, I just used a custom startup script that downloaded the model once ollama was up. It's the same thing I'd do on a local cluster though. I'm not sure how fly could make it better unless they offered direct integration with ollama or some other inference server?

tptacek · 2025-02-14T23:13:57 1739574837

I mean, yes? Managing giant model weight files is a big problem with getting people on-demand access to Docker-based micro-VMs. I don't think we missed that point so much as that we acknowledged it, and found some clarity in the idea that we weren't going to break up our existing DX just to fix it. If there were lots and lots and lots of people trying to self-host LLMs running into this problem, it would have been a harder call.

akoculu · 2025-02-14T23:15:25 1739574925

Did you consider other use cases in which people need custom models and inference other than just open source LLMs ?

tptacek · 2025-02-14T23:16:52 1739575012

Yes. Click through to the L40S post the article links to (the L40S's aren't going anywhere).

There are people doing GPU-enabled inference stuff on Fly.io. That particular slice of the market seems fine?