I have some serious love for mmap. I don't know how many people's code I've opti...

minimax · on April 1, 2011

What language could you possibly be using here? The first line looks like C, but then in C on most systems, open returns an int (a file descriptor). Maybe you wrote open() but you meant fopen()? An easy mistake, but then how are we to interpret the syntax in the 3rd line? f.read(...)? What is this I don't even.

> I don't know how many people's code I've optimized...

I'm gonna venture a guess of 0 on this one.

juiceandjuice · on April 1, 2011

haha, I just realized this... I meant C, but I think I've been screwing around in python so much lately that I kind of did that naturally on accident.

dasht · on March 31, 2011

What if the input is not from a regular file?

jules · on March 31, 2011

In which cases is mmap faster?

juiceandjuice · on April 1, 2011

data analysis, i.e. reading in large files, but overall it's pretty fast anyways. Furthermore, you don't read the whole file into memory so your program only uses a small amount of memory while running. I'd work with grad students reading in 1-2GB of data to analyze it and allocating that all to a process, and they'd wonder why their system would slow down.

The best thing about it is that the OS does the caching. So, say I analyze a file, I run a program with 4 arguments that maps a file to memory. Next time it runs, the file is still in memory so it doesn't bother re-reading the original file. (Try doing a recursive grep on directory, then try it again right after)

jules · on April 3, 2011

Thanks! That leaves the question: when is mmap slower, or is it always faster? And why isn't mmap the default usage pattern if it's better in most cases (and easier to use to boot)?