Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

With a 70B param model how many tokens/second?

Did the math and assuming 100% util and equal performance (which is certainly not the case) payback on your Mac is 9 months...



You need to pay for dedicated because they’re generally unavailable in the moment. So it’s more like 45 days, if we’re only talking about a single GPU—but we’re talking about ~2x.


How many tokens a second? Really trying to figure out viability.

4x NVIDIA A100 at lamda labs is $4.40 an hour and I really have not had an issue getting them.



Thanks! Ya, I opted for dual 3090 for my workstation (keeping full LLM in VRAM is crit) was wondering what lift was for M2.

OP implied that there were workloads where it out competes renting in terms of cost. Was hoping it was true for something than a single user interactive session (which can be done a lot cheaper)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: