As for why the naive rust version is slower, it's because without adding a BufWriter in rust, stdout is line-buffered, so each line emits a write system call, while with python, stdout is buffered. Python 2 emits writes of 4096 bytes, and python 3... 8193 bytes (edit: not a typo, this is 8KiB + 1). That's the likely cause for it being slower.
Edit:
A minimal version of the naive rust version would be:
fn main() {
loop {
println!("y");
}
}
On the same machine as with the python tests above, I get:
$ ./yes | pv -r > /dev/null
[4.81MiB/s]
which is actually as slow as python 3, despite doing 4 thousand times more system calls.
A version with buffering would look like:
use std::io::{stdout,BufWriter,Write};
fn main() {
let stdout = stdout();
let mut out = BufWriter::new(stdout.lock());
loop {
writeln!(out, "y").unwrap();
}
}
And produces 129MiB/s on the same machine. And that's strictly doing the same as what the python version does (with a default buffer size of 8KiB, apparently).
And fwiw, on the same machine, both GNU yes and the full rust solution from OP do 10.5GiB/s.
> As for why the naive rust version is slower, it's because without adding a BufWriter in rust, stdout is line-buffered, so each line emits a write system call, while with python, stdout is buffered. Python 2 emits writes of 4096 bytes, and python 3... 8193 bytes. That's the likely cause for it being slower.
Does it have nothing to do with the fact that string-of-bytes is the default in Python 2, whereas string-of-characters is the default in Python 3? Or is that perhaps related to the explanation you gave? Forcing the byte interpretation, Python 3 is slightly faster than Python 2 for me. Forcing the character interpretation, Python 2 wins, but not by as much as before.
Your bytes version outputs lines of, literally, `b'y'`.
The characters versions is still a clear win for python2 on my machine (8.9MiB/s vs. 5.6MiB/s)
It's also worth noting that the buffering behavior of python is only happening because the output is a pipe to pv. If it were the terminal, it would be line buffered, like the naive rust version.
In both cases, a 4KiB buffer is used by python. That's still way slower than the equivalent rust code with a 4KiB buffer (use BufWriter::with_capacity(4096, stdout.lock()) instead of BufWriter::new(stdout.lock())).
Edit: A minimal version of the naive rust version would be:
On the same machine as with the python tests above, I get: which is actually as slow as python 3, despite doing 4 thousand times more system calls.A version with buffering would look like:
And produces 129MiB/s on the same machine. And that's strictly doing the same as what the python version does (with a default buffer size of 8KiB, apparently).And fwiw, on the same machine, both GNU yes and the full rust solution from OP do 10.5GiB/s.