Mistral's partnership with Cerebras for inference hardware has received less commentary than I expected. They're basically blowing the competition out of the water, with Le Chat getting 1,100+ tokens per second of per-user throughput.
I'm curious when someone will do the right experiment in a way that some LLM on Cerebras will do the reasoning so well so big so fast, that it does something very novel