Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Ah, a Cray. Buck up; there are better supercomputers.


It was, in fact. How did you infer that?


/scratch on a Cray is (last I used one) is a Lustre filesystem. It is the only file storage available to the compute nodes, which have no local storage of their own. No spinning disk, no SSD. So if the code is written such that it makes frequent small writes (e.g. it's peppered with print statements), the lustre nodes get hammered by all the compute nodes and become the bottleneck and they will eventually fall over.


Interesting! This was exactly the case on the system I used. I didn't realize Cray was the only vendor who went the no-local-storage route.


They're not. They're just the only ones who insist on doing it with a filesystem that can't handle the load.

The other problem is that their interconnect is relatively fragile. It's comparatively easy to crash the entire network, at which time your filesystem goes away and processing stops.

But thanks to Lustre, even when it's good, it's bad.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: