What implementations do you suggest? I'd be happy to bring this up to the developers if there's a chance of speeding up our default RNG with one that's just as safe. Note that it must also be compatible with our MIT/Apache 2.0 dual license.
SUPERCOP [1] has a bunch of public-domain implementations. [2] is a portable one. To get top speed, you need SIMD, though. The available implementations in SUPERCOP are in (x86) assembly, but they really should be converted to intrinsics to be more portable.
I don't know if Rust has intrinsics or some other kind of vector register support. I'd even volunteer to implement Salsa20 on it.
There's support for SIMD in Rust, and the compiler intrinsics that LLVM supports. Inline assembler is supported as well. However, note that the SIMD support will quite possibly change to become more first-class.