Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Fun fact:

  $ python yes.py | pv -r > /dev/null
  [11.2MiB/s]

  $ python3 yes.py | pv -r > /dev/null
  [4.95MiB/s]
As for why the naive rust version is slower, it's because without adding a BufWriter in rust, stdout is line-buffered, so each line emits a write system call, while with python, stdout is buffered. Python 2 emits writes of 4096 bytes, and python 3... 8193 bytes (edit: not a typo, this is 8KiB + 1). That's the likely cause for it being slower.

Edit: A minimal version of the naive rust version would be:

  fn main() {
    loop {
      println!("y");
    }
  }
On the same machine as with the python tests above, I get:

  $ ./yes | pv -r > /dev/null
  [4.81MiB/s]
which is actually as slow as python 3, despite doing 4 thousand times more system calls.

A version with buffering would look like:

  use std::io::{stdout,BufWriter,Write};

  fn main() {
    let stdout = stdout();
    let mut out = BufWriter::new(stdout.lock());
    loop {
      writeln!(out, "y").unwrap();
    }
  }
And produces 129MiB/s on the same machine. And that's strictly doing the same as what the python version does (with a default buffer size of 8KiB, apparently).

And fwiw, on the same machine, both GNU yes and the full rust solution from OP do 10.5GiB/s.



> As for why the naive rust version is slower, it's because without adding a BufWriter in rust, stdout is line-buffered, so each line emits a write system call, while with python, stdout is buffered. Python 2 emits writes of 4096 bytes, and python 3... 8193 bytes. That's the likely cause for it being slower.

Does it have nothing to do with the fact that string-of-bytes is the default in Python 2, whereas string-of-characters is the default in Python 3? Or is that perhaps related to the explanation you gave? Forcing the byte interpretation, Python 3 is slightly faster than Python 2 for me. Forcing the character interpretation, Python 2 wins, but not by as much as before.

Bytes:

  while True:
      print(b'y')
Characters:

  while True:
      print(u'y')


Your bytes version outputs lines of, literally, `b'y'`.

The characters versions is still a clear win for python2 on my machine (8.9MiB/s vs. 5.6MiB/s)

It's also worth noting that the buffering behavior of python is only happening because the output is a pipe to pv. If it were the terminal, it would be line buffered, like the naive rust version.


python3 seems to do much better than either of those for me when using an unbuffered write(1, ...) syscall (plus it prints the correct thing)

    $ cat yes3.py
    stdout = open(1, 'wb')
    while True:
        stdout.write(b'y\n')
    $ python3 yes3.py | pv -r > /dev/null
    [13.7MiB/s]

    $ cat yes2.py
    import os
    stdout = os.fdopen(1, 'wb')
    while True:
        stdout.write('y\n')
    $ python2 yes2.py | pv -r > /dev/null
    [7.77MiB/s]


For better comparison with my numbers, I ran your scripts on my machine:

  $ python3 yes3.py | pv -r > /dev/null
  [18.4MiB/s]
  $ python2 yes2.py | pv -r > /dev/null
  [10.2MiB/s]
In both cases, a 4KiB buffer is used by python. That's still way slower than the equivalent rust code with a 4KiB buffer (use BufWriter::with_capacity(4096, stdout.lock()) instead of BufWriter::new(stdout.lock())).


Out of curiosity:

yes2.py:

  import os
  stdout = os.fdopen(1, 'wb')
  while True:
      stdout.write('y\n')


  $ python yes2.py | pv -r > /dev/null
  [9.12MiB/s]

  $ pypy yes2.py | pv -r > /dev/null
  [45.5MiB/s]
So pypy does a good job of speeding it up.

Off a quick 9 second run, python2 with profiling:

     ncalls  tottime  percall  cumtime  percall filename:lineno(function)
          1    4.272    4.272    9.301    9.301 yes2.py:1(<module>)
   30856080    5.029    0.000    5.029    0.000 {method 'write' of 'file' objects}
          1    0.000    0.000    0.000    0.000 {posix.fdopen}
          1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


I got 1.26Gb/s in Python2 on MacOS (i7 mac air)

    import os
    ys = 'y\n' * 2048
    while True:
        os.write(1, ys)
update: seems to peek at around 3gb/s with ys = 'y\n' * 2 * * 16

(had to separate the asters to stop HN swallowing them)

Plain old `yes` gets a measly 33Mb/s!!


On my Macbook with way too much running, I get about 800MiB/s in Python2 with your script, but 1.12GiB/s with this Python3 script:

    import sys
    
    s = b'y\n' * 1024
    
    write = sys.stdout.buffer.write
    while True:
        write(s)
    
    # 1.12GiB/s
Plain old yes comes in at 22Mib/s.

The Python3 docs say to "use the underlying binary buffer object" when reading or writing binary data.

https://docs.python.org/3/library/sys.html


tested on my C1 on scaleway:

  perl -e 'print "y\n" while 1' | pv -r > /dev/null
  [3.37MB/s]
  perl -e 'print "y\n" x 2048 while 1' | pv -r > /dev/null
  [ 425MB/s]
  yes | pv -r > /dev/null
  [11.1MB/s]


>because without adding a BufWriter in rust, stdout is line-buffered, so each line emits a write system call,

Why the hell is it line-buffered when writing to a pipe? Yet another common sense “enhancement”?


It's always line buffered. Unlike python, where its behaviour depends on whether stdout is a tty or not.


That doesn’t answer my question, let me reformulate it: what’s the rationale behind this behavior?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: