Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have some serious love for mmap.

I don't know how many people's code I've optimized by eliminating:

FILE *f;

f = open("file.dat","r")

f.read(...)

and replacing it with mmap'd I/O



What language could you possibly be using here? The first line looks like C, but then in C on most systems, open returns an int (a file descriptor). Maybe you wrote open() but you meant fopen()? An easy mistake, but then how are we to interpret the syntax in the 3rd line? f.read(...)? What is this I don't even.

> I don't know how many people's code I've optimized...

I'm gonna venture a guess of 0 on this one.


haha, I just realized this... I meant C, but I think I've been screwing around in python so much lately that I kind of did that naturally on accident.


What if the input is not from a regular file?


In which cases is mmap faster?


data analysis, i.e. reading in large files, but overall it's pretty fast anyways. Furthermore, you don't read the whole file into memory so your program only uses a small amount of memory while running. I'd work with grad students reading in 1-2GB of data to analyze it and allocating that all to a process, and they'd wonder why their system would slow down.

The best thing about it is that the OS does the caching. So, say I analyze a file, I run a program with 4 arguments that maps a file to memory. Next time it runs, the file is still in memory so it doesn't bother re-reading the original file. (Try doing a recursive grep on directory, then try it again right after)


Thanks! That leaves the question: when is mmap slower, or is it always faster? And why isn't mmap the default usage pattern if it's better in most cases (and easier to use to boot)?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: