[1] http://www.ssfpack.com/
SsfPack will still be faster though and be less memory hungry than Python. On the brief look I had, it also seems that the nonlinear/non-Gaussian simulation methods are not implemented.
[1] http://www.ssfpack.com/