Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Very rough (!) napkin math: for a q8 model (almost lossless) you have parameters = VRAM requirement. For q4 with some performance loss it's roughly half. Then you add a little bit for the context window and overhead. So a 32B model q4 should run comfortably on 20-24 GB.

Again, very rough numbers, there's calculators online.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: