Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting that this paper contains hard numbers hinting at Google's absolute scale. They say Monarch has 144000 leaves. Even if each leaf is assigned only 1 CPU core -- which is probably a low estimate because who would do that? -- that makes Google's monitoring stack a Top 100 supercomputer.

The only other places I've seen Google give out hard numbers were a presentation by Jeff Dean mentioning map-reduce core-years consumed per day, and a footnote in a paper that mentions how much CPU time Google Exacycle gives away to scientific computing every day. All of these calibration points are eye-opening.



they also estimate close to a petabyte of RAM; if all the RAM is spent in leaves that is about 6-7GB per leaf. I don't know that we can say it would be unreasonable to have 1 core per leaf replica; presumably some leaves have low utilisation so it might make sense to share the core with another workload. From a capacity planning standpoint, I think they leave this open though they do indicate that they are sometimes CPU bound so don't try to compress beyond delta compression. That might suggest multiple cores per leaf.

A single core design allows a very simple concurrency model, without having to worry about cache pingponging, false sharing, or myriad other issues. The parallelism is applied at higher layers, as there are multiple replicas for each leaf and obviously they can use many cores effectively overall.

I don't see that the paper gives enough information to help us prune the design space here.


We don't run exacycle any more (I built and ran exacycle for several years). It's not a cost-effective way to do science, but yes, the scale was absolutely insane. These days, I'm more interesting in seeing if there are ways to use TPUs, rather than CPUs, for similar kinds of opportunistic computing.


Yeah the "supercomputer" ranking is a bit of a joke. Every mid-sized google dc would count as a top 10 supercomputer.


I work at one such "medium sized" Google DCs. Supercomputers are typically much more interconnected, whilst we have a much more traditional topology.


Supercomputers are more about the network topology than raw processing power.


supercomputers, by tradition, are tightly coupled in a way that Google datacenter servers aren't. The closest thing Google has to supercomputers are GPUs linked by high performance networks, and TPUs (which have their own custom toroidal mesh).


Isn't the whole point of a supercomputer that it isn't just a datacentre with an LED display on the front?


When I had done the presentation at Facebook's @Scale NYC conference last summer, I talked the powers that be into using more specific numbers, and they allowed it for the paper too.



It's not a supercomputer.

The reason supercomputers are so tiny compared to cloud data-centers is that cloud computing is highly and mostly-triviallly parallel, but supercomputers are mesh and serial.


That appears to be your own private definition of the term. There are lots of things in the top500 list that are just a pile of Xeon boxes with Ethernet.


*that's using monarch


Is the entire internet the world's largest supercomputer?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: