This video [1] shows someone running at 4-bit quant in 48gb VRAM. I suspect you ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		tarruda on Feb 23, 2024 \| parent \| context \| favorite \| on: Phind-70B: Closing the code quality gap with GPT-4... This video [1] shows someone running at 4-bit quant in 48gb VRAM. I suspect you need 4x that to run at full f16 precision, or approx 3 H100. https://www.youtube.com/watch?v=dJ69gY0qRbg

jxy on Feb 23, 2024 [–]

Yeah, 4bit would take 35 GB at least. 16bit would be 140 GB. I'm more interested in how Phind is serving it. But I guess that's their trade secret.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact