And just because this kind of thing is fun, if you use the right kind of pattern on a big enough file, SIMD can be quite noticeable:
$ rg-with-simd --version
ripgrep 0.8.1 (rev 223d7d9846)
+SIMD +AVX
$ rg-without-simd --version
ripgrep 0.8.1
-SIMD -AVX
$ time cat OpenSubtitles2016.raw.en > /dev/null
real 0m1.280s
user 0m0.020s
sys 0m1.257s
$ time wc -l OpenSubtitles2016.raw.en
336602465 OpenSubtitles2016.raw.en
real 0m4.303s
user 0m3.132s
sys 0m1.167s
$ time rg-with-simd -c 'Sherlock Holmes|John Watson|Professor Moriarty' OpenSubtitles2016.raw.en
6033
real 0m2.099s
user 0m1.750s
sys 0m0.347s
$ time rg-without-simd -c 'Sherlock Holmes|John Watson|Professor Moriarty' OpenSubtitles2016.raw.en
6033
real 0m4.128s
user 0m3.781s
sys 0m0.343s
$ time rg-with-simd -c 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' OpenSubtitles2016.raw.en
6731
real 0m1.989s
user 0m1.621s
sys 0m0.366s
$ time rg-without-simd -c 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' OpenSubtitles2016.raw.en
6731
real 0m18.417s
user 0m18.000s
sys 0m0.403s
Looks like `cat` is still faster, so there's some room for improvement. ;-) With a single pattern, we're almost there:
$ time rg -c 'Sherlock Holmes' OpenSubtitles2016.raw.en
5107
real 0m1.333s
user 0m0.974s
sys 0m0.357s
This one is mostly thanks to glibc's memchr implementation (which uses SIMD of course), and the regex crate's frequency based searcher.
Of course, I'm presenting best cases here. Plenty of inputs can make ripgrep run quite a bit more slowly than what's shown here!
The crazy thing is that we're still only barely scratching the surface. Check out Intel's Hyperscan project for some truly next level SIMD use in regex searching!
Of course, I'm presenting best cases here. Plenty of inputs can make ripgrep run quite a bit more slowly than what's shown here!
The crazy thing is that we're still only barely scratching the surface. Check out Intel's Hyperscan project for some truly next level SIMD use in regex searching!