Isn't it very expensive to do this? Don't you need a jacked up PC with multiple ...

selfhoster11 · on Oct 13, 2023

For basic usage, you can get away with a small graphics card or no graphics card at all (albeit it will be very slow).

The general rule of thumb is, take a model size (7B, 13B, 34B, 70B) and multiply that by 0.5 or 0.625. If that number is smaller than the combined amount of system RAM and VRAM in your system, you can run the model at 4-bit and 5-bit quantization respectively.

Havoc · on Oct 12, 2023

A jacked up PC can do really well and there is much fun to be had there.

...but you'd struggle to get close to even GPT 3.5 let alone 4 for generic tasks.

For custom tunes...yeah sure custom rolls will beat generic openAI. But that's a bit like pitting customed tuned cars against street legal manufacturer cars. It's an apple to oranges comparison

dragonwriter · on Oct 13, 2023

> ...but you'd struggle to get close to even GPT 3.5 let alone 4 for generic tasks.

But tasks are only generic in the aggregate, and with local models you can hand-off between different models for different tasks.

kuchenbecker · on Oct 13, 2023

If it takes ~150k / year on AWS to fine tune, I'd assume ~50k onprem based on experience with cloud pricing.