Wow, I didn't realise grep had a separate --mmap option. Interesting.
I don't think it would make much difference in the parent's case of many, small, fragmented files because if you're mmapping each file in turn and it's not cached, it still needs to be loaded from disk - it just happens in a page fault instead of the read() call.
Possibly if you mmapped all of the files and then used madvise() or something to prefetch in front of where you are in the list of files. Maybe grep does that, I don't know?
I guess the case where that technique would help is actually when you have a combination of (a) many files, (b) a computationally expensive pattern match (even just -i is a measurable hit) and (c) largeish files.
Because on many small files and a simple match, the disk I/O is still going to be the major component - even if you prefetch you still can't get around needing to load all the file contents from disk.
Possibly if you mmapped all of the files and then used madvise() or something to prefetch in front of where you are. Maybe grep does that, I don't know?
It doesn't do that, and that's the first method I used in my hack.
I don't think it would make much difference in the parent's case of many, small, fragmented files because if you're mmapping each file in turn and it's not cached, it still needs to be loaded from disk - it just happens in a page fault instead of the read() call.
Possibly if you mmapped all of the files and then used madvise() or something to prefetch in front of where you are in the list of files. Maybe grep does that, I don't know?
I guess the case where that technique would help is actually when you have a combination of (a) many files, (b) a computationally expensive pattern match (even just -i is a measurable hit) and (c) largeish files.
Because on many small files and a simple match, the disk I/O is still going to be the major component - even if you prefetch you still can't get around needing to load all the file contents from disk.