Hacker Newsnew | past | comments | ask | show | jobs | submit | _benedict's commentslogin

I can’t speak for the authors, but I have been lucky enough to be collaborating with them on behalf of the Apache Cassandra project, to refine and prove the correctness of the Accord protocol - a derivative of EPaxos we have integrated into the database.

It would be fantastic if such a project could be pursued for this variant, which has the distinction of being the only “real world” implementation.

Either way, thank you for the original EPaxos paper - it has been a privilege to convert its intuitions into a practical system.


“Paxos” is a term that can mean many different things, so it’s better not to get too attached to any one meaning especially in different contexts.

Multi Paxos is commonly used (especially in industry) as short hand for multi decree Paxos (in contrast to single decree Paxos), but “Paxos” most often refers to the family of protocols, all of which are typically implemented with a leader. It is confusing of course because single decree Paxos is used to implement EPaxos (and its derivatives).

It’s worth noting also that Lamport is (supposedly) on the record as having intended “Paxos” to refer to the protocol incorporating the leader optimisation.


I remember there being sufficient documentary evidence in the entrance/shop/museum bit to conclude it was most likely created by the very people who “discovered” it, to serve as a tourist attraction.

Thank you for linking the source material, unfortunately it badly contradicts you. It clearly shows that the _very first_ list of ten suggested search terms contained (pretty heavily) sexualised suggestions.


I suppose some of that stuff could reasonably be called "sexualized". Pornographic? No. A problem? Not unless you have really weird hangups.

Here's a unified list of all the "very first list" suggestions they say they got. I took these from their appendix, alphabetized them, and coalesced duplicates. Readers can make their own decisions about whether these justify hauling out the fainting couch.

+ Adults

+ Adults on TikTok (2x)

+ Airfryer recipes

+ Bikini Pics (2x)

+ Buffalo chicken recipe

+ Chloe Kelly leg up before penalty

+ cost of living payments

+ Dejon getting dumped

+ DWP confirm £1,350

+ Easy sweet potato recipes

+ Eminem tribute to ozzy

+ Fiji Passed Away

+ Gabriela Dance Trend

+ Hannah Hampton shines at women’s eu [truncated]

+ Hardcore pawn clips (2x)

+ Has Ozzy really died

+ Here We Go Series 3 Premieres on BBC

+ HOW TO GET FOOTBALL BLOSSOM IN…

+ ID verification on X

+ Information on July 28,2.,,,

+ Jet2 holiday meme

+ Kelly Osbourne shared last video with [truncated]

+ Lamboughini

+ luxury girl

+ Nicki Minaj pose gone wrong

+ outfits

+ Ozzy Funeral in Birmingham

+ pakistani lesbian couple in bradford

+ revenge love ep 13 underwater

+ Rude pics models (2x)

+ Stock Market

+ Sydney Sweeney allegations

+ TikTok Late Night For

+ TIKTOK SHOP

+ TikTok Shop in UK

+ TIKTOK SHOP UK

+ Tornado in UK 2025

+ Tsunami wave footage 2025

+ Unshaven girl (3x)

+ Very rude babes (3x)

+ very very rude skimpy

+ woman kissing her man while washing his [truncated] (2x)


No, the ruling expressly refers to the list as non exhaustive, but given the other related references to misconduct (including the use of inappropriate language) it was not reasonable to infer that this example was gross misconduct.


Worth noting, not quite "everyone" does this. Cassandra uses "leaderless" (single decree) paxos, which has some advantages and some disadvantages (for instance, 1RT WAN reads from any region).

I agree with you that Paxos is simpler than Raft. The problem with Paxos IMO is that Lamport's original paper is impenetrable; lots of later writing is easier to understand, including those that describe more complex protocols. The intuitions are actually pretty straightforward, and transfer to all of the extensions to Paxos (which are not as straightforwardly compatible with Raft).

Raft may have helped more people get comfortable with distributed consensus, and sped its adoption, but being a sort of dangling branch of the tech tree I wonder if this may have stalled progress beyond it.


Interesting way to think about, I am not sure I quite agree, but good points and surrounding discussion for sure..


Do you anywhere elaborate what you mean by leaderless, and how this affects the semantics and guarantees you offer?

So far as I understand both Kafka and Pulsar use (leader-based) consensus protocols to deliver some of their features and guarantees, so to match these you must either have developed a leaderless consensus protocol, or modify the guarantees you offer, or else have a leader-based consensus protocol you utilise still?

From one of your other answers, you mention you rely on Apache Bookkeeper, which appears to be leader-based?

I ask because I am aware of only one industry leaderless consensus protocol under development (and I am working on it), and it is always fun to hear about related work.


Whoa a leaderless consensus protocol sounds pretty revolutionary!! So many question -- do you have any resources on this you could share?


Revolutionary may be an overstatement, it just affords different system characteristics. There's plenty of literature on the topic though, starting generally with EPaxos[1]. The protocol that we are developing is for Apache Cassandra, is called Accord[2], and forms the basis of our new distributed transaction feature [3]. I will note that the whitepaper linked in [3] is a bit out of date, and there was a bug in the protocol specification at that time. We hope to publish an updated paper in a proper venue in the near future.

[1] https://www.cs.cmu.edu/~dga/papers/epaxos-sosp2013.pdf [2] https://github.com/apache/cassandra-accord [3] https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15...


https://www.vldb.org/pvldb/vol15/p1337-lee.pdf

Is this you also or total coincidence?


Not even a coincidence really, it's a very different kind of system. It's an implementation of Hermes with network layer integration. Hermes is designed with very different goals in mind, specifically within-DC consensus with minimal failures (with the caveat I am not intimately familiar):

- Every replica must acknowledge a write, which is undesirable in a WAN setting, due to having to wait for replies from the furthest region

- At most one concurrent "read-modify-write" operation may succeed, so peak throughput is limited by request latency

- Failure of any replica requires reconfiguration for any request to succeed (equivalent to leader election), so the leaderless property here does not improve tail latencies, indeed it is likely harmed by exposing your workload to more required reconfigurations

Cassandra is designed for multiple (usually quite far apart) DC deployments that want to maximise availability and minimise latency, and where failure is expected. Here a quorum system is typically preferable for request latency.


This doesn’t seem to provide higher write availability, and if the read replicas are consistent with the write replica this design must surely degrade write availability as it improves read availability, since the write replica must update all the read replicas.

This also doesn’t appear to describe a higher durability design at all by normal definitions (in the context of databases at least) if it’s async…?


Yeah, this is not about write availability, but as the OP/author points out, scaling that is not the bottleneck for most apps.


I think you may have misunderstood the GP and are perhaps misusing terminology. You cannot meaningfully scale vertically to improve write availability, and if you care about availability a single machine (and often a primary/secondary setup) is insufficient.

Even if you only care about scaling reads, eventually the 1:N write:read replica ratio will become too costly to maintain, and long before you reach that point you likely sacrifice real-time isolation guarantees to maintain your write availability and throughput.


> You cannot meaningfully scale vertically to improve write availability

Disagree. Even if you limit yourself to the cloud, r7i/r8g.48xl gets you 192 vCPU / 1.5 TiB RAM. If you really want to get silly, x2iedn.32xl is 128 vCPU / 4 TiB RAM, and you get 3.8 TiB of local NVMe storage for temp tablespace. The money you’ll pay ($16.5K - $44K month, depending on specific class) would pay for a similarly spec’d server in the same amount of time, though.

Which brings me to the novel concept of owning your own hardware. A quick look at Supermicro’s site shows a 2U w/ up to 1.92 PB of Gen5 NVMe, 8 TiB of RAM, and dual sockets. That would likely cost a wee bit more than a month of renting the aforementioned AWS VM, but a more reasonably spec’d one would not. Realistically, that much storage would be used as SDS for other DBs to use. NVMoF isn’t quite as fast as local disks, but it’s a hell of a lot faster than EBS et al.

The point is that you actually can vertically scale to stupidly high levels, it’s just that most companies have no idea how to run servers anymore.

> and if you care about availability a single machine (and often a primary/secondary setup) is insufficient.

Depending on your availability SLOs, of course, I think you’d find that a two-node setup (optionally having N read replicas) with one in standby would be quite sufficient. Speaking from personal experience on RDS (MySQL fronted with ProxySQL on K8s, load balanced with NLB), I experienced a single outage in two years. When it happened, no one noticed, it was so brief. Some notice-only alerts for 500s in Slack, but no pages went out.


> If you really want to get silly, x2iedn.32xl is 128 vCPU / 4 TiB RAM, and you get 3.8 TiB of local NVMe

This doesn't affect availability - except insofar as unavailability might be caused by insufficient capacity, which is not the typical definition.

> Depending on your availability SLOs, of course

Yes, exactly. Which is the point the GP was making. You generally make the trade-off in question not for performance, but because you have SLOs demanding higher availability. If you do not have these SLOs, then of course you don't want to make that trade-off.


> This doesn't affect availability - except insofar as unavailability might be caused by insufficient capacity, which is not the typical definition.

I agree, but it seemed to me that GP was using it as such: "You cannot meaningfully scale vertically to improve write availability"


The big caveat about these configurations is the amount of time it takes to rebuild a replica due to the quantity of storage per node that has to be pushed over the network. This is one of the low-key major advantages of disaggregated storage.

I prefer to design my own hardware infrastructure but there are many operational tradeoffs to consider.


> you likely sacrifice real-time isolation guarantees to maintain your write availability and throughput

No worries there, in all likelihood isolation has probably been killed twice already. Once by running the DB on READ COMMITTED, and a second time by using an ORM like EF to read data into your application, fiddle with it in-RAM, and write the new (unrelated-to-what-was-read) data back to the DB.

In other words, we throw out all that performant 2010-2020 NoSQL & eventual consistency tech, and go back to good old fashioned SQL & ACID, because everyone knows SQL, and ACID is amazing. Then we use LINQ/EF instead because it turns out that no-one actually wants to touch SQL, and full isolation is too slow so that gets axed too.


TigerBeetle uses VSR, which is basically a variant of MultiPaxos/Raft.


Nit! It’s a bit more historically accurate to say that MultiPaxos/Raft are later variants of VSR since Brian Oki’s VSR predated Paxos by a year (‘88 vs ‘89) and Liskov and Cowling’s revision of Brian’s work in 2012 predated Raft by two years (the papers are remarkably similar, but Raft makes concessions for the sake of presentation).


I know it was published first, we’ve talked about this before :)

But, I’m not sure what was published first decide what’s a variant of what. I would say that, given the breadth of research into variants of Paxos and the ways it can be modified, it is most meaningful today to say they’re all variants of Paxos.

VSR having had little to no research or industry application until recently has a pretty weak claim. It does not appear to have influenced either Paxos or Raft. Raft was influenced by Paxos, and even VSR revisited discusses it in relation to these protocols.


In fact, the Raft paper cites that it was most influenced by VSR:

“Raft is similar in many ways to existing consensus algorithms (most notably, Oki and Liskov’s Viewstamped Replication [29, 22])”

Happy to keep having this conversation, if only to shine a spotlight and pay tribute to some of the (lesser known but nevertheless) pioneers of our field. :)


I don’t interpret those words that way. I see that as a recognition of the VSR paper, as had been recently highlighted in VSR revisited at the time of publication. I guess you would have to ask the author if VSR had actually influenced his work, it’s certainly possible, but not the inference I would make from that snippet.

The paper references Paxos something like 100 times, versus 3 for VSR. It defines itself as a more understandable alternative to Paxos, so it was certainly influenced both by the existence and relevance of Paxos, and also in opposition to its apparent difficulty.


A good example to illustrate this perhaps is Babbage. He invented the computer first, but nobody using computers today was influenced by him, impressive though his achievements were! Nor would we say that computers are a kind of Babbage “analytical engine”. We say they are a kind of computer.


Ha, as it happens there's documentary evidence online from Diego himself, that he was not influenced by VSR.

https://groups.google.com/g/raft-dev/c/cBNLTZT2q8o


Diego there is only referring to “the VRR paper” (note the double “R”), i.e. specifically the VR “Revisited” paper of Cowling and Liskov in 2012 (not Oki and Liskov’s ‘88 work, which has a different title).

I wish I could share with you some of the anecdotes I’ve been privy to, having dived into the events and personally interviewed some of the people involved.

The history (or total order!) of consensus is fascinating here, almost like a Greek island, but only a few people will ever know it.


Fair enough!


I’m not sure how a written constitution that is anyway interpreted by “the prevailing political elite class” is functionally much different?


At least there are words.

The Brits have nothing.


> At least there are words.

> The Brits have nothing.

There are words in the British constitution as well. Acts of Parliament that define how the Parliament and the courts function are constitutional laws, such as the Parliament Acts of 1911 & 1949 and the Constitutional Reform Act 2005. If we are going by words, there are a lot more words in these multiple constitutional documents than in the constitutional documents of many countries that only have one such document.


It's not meaningfully a constitution if it can be overridden in practice by a simple parliamentary majority vote, same as any other law. It's more like the "constitutions" that some absolute monarchies have or had in the past where the first thing is does is declare the monarch above any limits, just not quite as overt.


The constitution is the legal framework by which a country is governed. It is not necessarily a set of super-laws that are harder to change than regular laws (although it may contain such laws). The UK is also not the only democracy where the legislature can amend parts of the constitution with a simple majority. Besides, the UK itself has a super-law that cannot be amended, which is that the parliament is sovereign and cannot be bound by a previous parliament.

Neither the monarch, nor individual Members of Parliament, are above all limits under UK law.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: