Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

And just because this kind of thing is fun, if you use the right kind of pattern on a big enough file, SIMD can be quite noticeable:

    $ rg-with-simd --version
    ripgrep 0.8.1 (rev 223d7d9846)
    +SIMD +AVX
    $ rg-without-simd --version
    ripgrep 0.8.1
    -SIMD -AVX
    
    $ time cat OpenSubtitles2016.raw.en > /dev/null
    real    0m1.280s
    user    0m0.020s
    sys     0m1.257s
    $ time wc -l OpenSubtitles2016.raw.en
    336602465 OpenSubtitles2016.raw.en
    real    0m4.303s
    user    0m3.132s
    sys     0m1.167s
    $ time rg-with-simd -c 'Sherlock Holmes|John Watson|Professor Moriarty' OpenSubtitles2016.raw.en
    6033
    real    0m2.099s
    user    0m1.750s
    sys     0m0.347s
    $ time rg-without-simd -c 'Sherlock Holmes|John Watson|Professor Moriarty' OpenSubtitles2016.raw.en
    6033
    real    0m4.128s
    user    0m3.781s
    sys     0m0.343s
    $ time rg-with-simd -c 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' OpenSubtitles2016.raw.en
    6731
    real    0m1.989s
    user    0m1.621s
    sys     0m0.366s
    $ time rg-without-simd -c 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' OpenSubtitles2016.raw.en
    6731
    real    0m18.417s
    user    0m18.000s
    sys     0m0.403s

Looks like `cat` is still faster, so there's some room for improvement. ;-) With a single pattern, we're almost there:

    $ time rg -c 'Sherlock Holmes' OpenSubtitles2016.raw.en
    5107
    real    0m1.333s
    user    0m0.974s
    sys     0m0.357s
This one is mostly thanks to glibc's memchr implementation (which uses SIMD of course), and the regex crate's frequency based searcher.

Of course, I'm presenting best cases here. Plenty of inputs can make ripgrep run quite a bit more slowly than what's shown here!

The crazy thing is that we're still only barely scratching the surface. Check out Intel's Hyperscan project for some truly next level SIMD use in regex searching!



Uf da, that's still some good speedups.

BTW as a happy daily user of rg thanks for all the work you put into it, definitely shows.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: