With a 70B param model how many tokens/second? Did the math and assuming 100% ut...

garciasn · on Oct 11, 2023

You need to pay for dedicated because they’re generally unavailable in the moment. So it’s more like 45 days, if we’re only talking about a single GPU—but we’re talking about ~2x.

ttt3ts · on Oct 11, 2023

How many tokens a second? Really trying to figure out viability.

4x NVIDIA A100 at lamda labs is $4.40 an hour and I really have not had an issue getting them.

hnfong · on Oct 12, 2023

Note this is a M1, not M2.

https://www.reddit.com/r/LocalLLaMA/comments/16o4ka8/running...

ttt3ts · on Oct 12, 2023

Thanks! Ya, I opted for dual 3090 for my workstation (keeping full LLM in VRAM is crit) was wondering what lift was for M2.

OP implied that there were workloads where it out competes renting in terms of cost. Was hoping it was true for something than a single user interactive session (which can be done a lot cheaper)