Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Andrej Karpathy's Take:

Official post on Mixtral 8x7B: https://mistral.ai/news/mixtral-of-experts/

Official PR into vLLM shows the inference code: https://github.com/vllm-project/vllm/commit/b5f882cc98e2c9c6...

New HuggingFace explainer on MoE very nice: https://huggingface.co/blog/moe

In naive decoding, performance of a bit above 70B (Llama 2), at inference speed of ~12.9B dense model (out of total 46.7B params).

Notes: - Glad they refer to it as "open weights" release instead of "open source", which would imo require the training code, dataset and docs. - "8x7B" name is a bit misleading because it is not all 7B params that are being 8x'd, only the FeedForward blocks in the Transformer are 8x'd, everything else stays the same. Hence also why total number of params is not 56B but only 46.7B. - More confusion I see is around expert choice, note that each token and also each layer selects 2 different experts (out of 8). - Mistral-medium

Source: https://twitter.com/karpathy/status/1734251375163511203



Anyone have a feeling karpathy may leave openAI to join an actual Open AI startup where he can openly speak about training tweaks, the datasets architecture etc.

It seems recently OpenAI is the least open startup. Even Gemini talks more about their architecture.

OpenAI still doesn’t openly mention GPT4 is a mixture of experts model.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: