Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> So a long time ago, in a galaxy far far away, when the network was purportedly the computer, these questions kept me awake at night.

Interesting. That was a bit before "my time". It was just passed down as folk wisdom -- "never make the network transparent, it cannot be done easily" and never quite knew where that came from.

I remember asking a senior developer about RPC after seeing a a bunch of RPC daemons on a Sun machine, and he said, it is some really complicated stuff that you don't really want to know about and unless you absolutely need to use it. I was just a young intern then.

But there are a whole bunch of technologies from that time that try to abstract the network away (I can think of NFS, or I guess pretty much any remote storage that makes itself appear as a local file system).

Then I've seen RPC API philosophies that take 2 sides. Some magically proxy method calls and marshal data between object, others want users to always marshal by hand and send via the sockets.

> You have to be able to know exactly what the lowermost piece of the puzzle is going to do before you can reason about what pieces that depend on that may or may not do.

So it is possible to abstract any of that away? Maybe it is not a horizontal abstraction as in "hide the network away", but maybe a vertical abstraction, as in the creating socket-like objects that have extra features (like ZeroMQ). Connecting, sending, receiving is still there, but there is extra helping on top at each step.

On another side note. I think network topologies and characteristics changed. It used to be that creating consistent distributed systems was easiest even though networks speed were slower. Networks were more likely to be local in a data center. So chance of a network split was much lower. (And I've learned there are fewer more evil things in a general distributed system than a network split). Today's distributed systems are more likely so experience network splits (multi-zone, multi-datacenter clusters I guess are more common + unpredictability added by using VMs). So maybe paradoxically distributed system really got harder not easier.



So it is possible to abstract any of that away? Maybe it is not a horizontal abstraction as in "hide the network away", but maybe a vertical abstraction, as in the creating socket-like objects that have extra features (like ZeroMQ). Connecting, sending, receiving is still there, but there is extra helping on top at each step.

Its possible to design bits of it away (and that becomes necessary) in order to build larger systems. Protobufs were, in a lot of ways, one response.

The key takeaway for me from that time, and going forward, was that distributed services are effectively messages in flight. The more messages in flight, the more combinations of arrival times, the fewer messages in flight, the less scaling. This became what is known as the CAP theorem. (Wonderfully articulated by Bowers, when I read it I said, "Yeah that!")

For things like RAID arrays, where you can related the state of the data in a RAID array mathematically with the data that you know, you can abstract it away to a pretty straight forward open/read/write interface. For things where the data set mutates along a path (imagine a state machine of 'n' states, where its 'path' is the sequence of states it is in between time 0 and time t) if correctness is a function of the path then you have a lot harder time of it. The canonical example of this is two writers to a file. The sequence Insert A -> Insert B -> Insert C -> Delete Last leaves behind a different file if it sees Insert A -> Insert B -> Delete Last -> Insert C.

Unwinding those sorts of things seems (or at least illuminating them) seems to be what Jepsen is trying to solve.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: