Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I tend to agree that many explanations of raft dont get into the useful details and handwave some of the hard problems. But the original paper does do a good job of this and is pretty accessible to read IMO.

> I read "once the leader has been elected", um, hangon, according to whom? Has node 1 finally agreed on the leader, just while node 3 has given up and started another election?

The simple response I think to "according to whom" is "the majority of voting nodes". When the leader assumes its role, it sends heartbeats which are then accepted by the other nodes in the cluster. Even if (in your example) node 3 starts a new election, it will only succeed if it can get a majority of votes. If node 2 has already acknowledged a leader, it won't vote for node 3 in the same term.

There's some implicit concessions inherent there around eventual consistency, but I don't think thats novel to Raft compared to other distributed consensus protocols.



> The simple response I think to "according to whom" is "the majority of voting nodes".

Reminds me of this one time we had a Raft cluster arguing over who was the leader for 20 minutes in production. Raft leader election is non-deterministic, while Paxos is deterministic. It can 'randomly' get into a situation it cannot resolve for quite a long time.


> Reminds me of this one time we had a Raft cluster arguing over who was the leader for 20 minutes in production

That's certainly an interesting failure mode. Do you recall the details around root cause? I could imagine ephemeral network partitions (flapping interfaces? peering loss?) causing something like this for sure.

In my own experience, I've been running services that use Raft under the hood for the last ~10 years in production and haven't seen this happen myself. Though I do absolutely remember having misconfigured election timeouts causing very painful latency issues in failover scenarios.


Root cause was “bad luck” IIRC. Every node voted mostly for itself.


Ah, interesting. That sort of split voting is indeed very bad luck, potentially a config-specific issue, or just a cluster that's seeing a catastrophic partition failure between every node.

In canonical Raft assuming no partition failures, this could only happen if every node's election timeout triggered at roughly the same time and they all become candidates simultaneously. For this state to persist (assuming short election timeouts and short heartbeat intervals), you have to get _really_ unlucky.

In terms of probabilistic likelihood though, this is about as likely as the live-lock issue in Paxos in which multiple proposals with differing proposal ids are made at the same time. You'd seem a similar delay in consensus in that scenario as well. Obviously MultiPaxos handles this with a separate leadership algorithm which makes that outcome much less likely, but the same types of strategies common in those systems to mitigate contention issues can be used in Raft as well (randomized backoffs for example).


Yeah, IIRC, we updated the configuration some. I don't remember what specifically, but now that you mention short timeouts, I vaguely remember that coming up as a problem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: