Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I feel like these guys are missing a pretty important point in their own analysis. I tried setting up a ollama LLM on a fly.io GPU machine and it was near impossible because of fly.io limitations such as: 1. Their infrastructure doesnt support streaming responses well at all (which is important part of the LLM experience in my view) 2. The LLM itself is massive, and cant be part of the docker image I was building and uploading. Fly doesnt have a nice way around this, so I had to setup a whole heap of code to pull it in on the fly machines first invocation, which doesnt work well if you start to run multiple machines. It was messy and ended up with a long support ticket with them that didnt get it working any better so I gave up.


What kind of issues did you have with streaming? I also set up ollama on fly.io, and had no issues getting streaming to work.

For the LLM itself, I just used a custom startup script that downloaded the model once ollama was up. It's the same thing I'd do on a local cluster though. I'm not sure how fly could make it better unless they offered direct integration with ollama or some other inference server?


I mean, yes? Managing giant model weight files is a big problem with getting people on-demand access to Docker-based micro-VMs. I don't think we missed that point so much as that we acknowledged it, and found some clarity in the idea that we weren't going to break up our existing DX just to fix it. If there were lots and lots and lots of people trying to self-host LLMs running into this problem, it would have been a harder call.


Did you consider other use cases in which people need custom models and inference other than just open source LLMs ?


Yes. Click through to the L40S post the article links to (the L40S's aren't going anywhere).

There are people doing GPU-enabled inference stuff on Fly.io. That particular slice of the market seems fine?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: