Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If GPT4 is 220B/8 experts, that would be in-line with 3.5 Turbo being a 20B model, and GPT4 being a 55B activation out of a total 220B parameters.

It is ultimately all speculation, until Deepseek releases their own 145B MoE model, and then we can compare the activations/results



I think the conjecture is that each expert of GPT-4 has 220B parameters, for a total of 1.76T parameters.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: