I tend to agree that many explanations of raft dont get into the useful details ...

withinboredom · on Sept 28, 2024

> The simple response I think to "according to whom" is "the majority of voting nodes".

Reminds me of this one time we had a Raft cluster arguing over who was the leader for 20 minutes in production. Raft leader election is non-deterministic, while Paxos is deterministic. It can 'randomly' get into a situation it cannot resolve for quite a long time.

spmurrayzzz · on Sept 28, 2024

> Reminds me of this one time we had a Raft cluster arguing over who was the leader for 20 minutes in production

That's certainly an interesting failure mode. Do you recall the details around root cause? I could imagine ephemeral network partitions (flapping interfaces? peering loss?) causing something like this for sure.

In my own experience, I've been running services that use Raft under the hood for the last ~10 years in production and haven't seen this happen myself. Though I do absolutely remember having misconfigured election timeouts causing very painful latency issues in failover scenarios.

withinboredom · on Sept 29, 2024

Root cause was “bad luck” IIRC. Every node voted mostly for itself.

spmurrayzzz · on Sept 29, 2024

Ah, interesting. That sort of split voting is indeed very bad luck, potentially a config-specific issue, or just a cluster that's seeing a catastrophic partition failure between every node.

In canonical Raft assuming no partition failures, this could only happen if every node's election timeout triggered at roughly the same time and they all become candidates simultaneously. For this state to persist (assuming short election timeouts and short heartbeat intervals), you have to get _really_ unlucky.

In terms of probabilistic likelihood though, this is about as likely as the live-lock issue in Paxos in which multiple proposals with differing proposal ids are made at the same time. You'd seem a similar delay in consensus in that scenario as well. Obviously MultiPaxos handles this with a separate leadership algorithm which makes that outcome much less likely, but the same types of strategies common in those systems to mitigate contention issues can be used in Raft as well (randomized backoffs for example).

withinboredom · on Sept 29, 2024

Yeah, IIRC, we updated the configuration some. I don't remember what specifically, but now that you mention short timeouts, I vaguely remember that coming up as a problem.