Inference should be closer to llama 13b, since it runs 2/8 experts for each toke... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		Palmik on Dec 11, 2023 \| parent \| context \| favorite \| on: Mixtral of experts Inference should be closer to llama 13b, since it runs 2/8 experts for each token.

sgt101 on Dec 11, 2023 [–]

Does it have to run them sequentially? I guess the cost will be 12/13bn level but latency may be faster?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact