This - honestly depending on the task hundreds of GB can be still the "single co...

super_mario · on April 3, 2018

Setting up ad hoc (aka standalone) Spark cluster with a bunch of machines you have control over is ridiculously trivial task though. It's as easy as running spark --master=x where you designate one machine as master. All others started with --master=x become slaves of x. Then you just submit jobs to x and that's all.

bitL · on April 3, 2018

Spark is slow though. On the other hand, Pandas is also extraordinarily slow :D