Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The awk run against a file containing almost no duplicates finished after over an hour (compared to 43sec for the sort method).

    $ time awk '!x[$0]++' x.out | wc -l
    16759719

    real    64m41.089s
    user    64m31.970s
    sys     0m3.136s
Peak memory usage (given that it was a 128MB input file) was (pid, rss, vsz, comm):

     8972 1239744 1246488  \_ awk
So > 1GB for a 128MB input file.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: