If GPT4 is 220B/8 experts, that would be in-line with 3.5 Turbo being a 20B mode... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		benxh on Feb 23, 2024 \| parent \| context \| favorite \| on: Phind-70B: Closing the code quality gap with GPT-4... If GPT4 is 220B/8 experts, that would be in-line with 3.5 Turbo being a 20B model, and GPT4 being a 55B activation out of a total 220B parameters. It is ultimately all speculation, until Deepseek releases their own 145B MoE model, and then we can compare the activations/results

devit on Feb 23, 2024 [–]

I think the conjecture is that each expert of GPT-4 has 220B parameters, for a total of 1.76T parameters.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact