The section is completely false. Memory-mapping files maps pages from the page cache, which lives in main memory. I suppose the author confused this with memory-mapped I/O and then confused port-based I/O with applications using syscalls. I see how you can arrive in that situation when you're only viewing the system through high-level abstractions in Java.
> Thanks to Wes McKinney for this brilliant innovation, its not a surprise that such an idea came from him and team, as he is well known as the creator of Pandas in Python. He calls Arrow as the future of data transfer.
I assume the confusion is with the author of the blog post and not Wes mcKinney, so this callout in that context is a real disservice.
> The output which displays the time, shows the power of this approach. The performance boost is tremendous, almost 50%.
Keep in mind that this is reading a 2 MB file with 100k entries, which somehow manages to consume half a second of CPU time. The author compares wall time and not CPU time; both runs consume somewhere between 600 ms and over a second of wall time (again, handling 2 MB of data). I wouldn't be surprised if the first call simply takes so long because it is lazily loading a bunch of code.
Later on memory consumption is measured, and one of the file format readers manages to consume -1 MB.
> Thanks to Wes McKinney for this brilliant innovation, its not a surprise that such an idea came from him and team, as he is well known as the creator of Pandas in Python. He calls Arrow as the future of data transfer.
I assume the confusion is with the author of the blog post and not Wes mcKinney, so this callout in that context is a real disservice.
> The output which displays the time, shows the power of this approach. The performance boost is tremendous, almost 50%.
Keep in mind that this is reading a 2 MB file with 100k entries, which somehow manages to consume half a second of CPU time. The author compares wall time and not CPU time; both runs consume somewhere between 600 ms and over a second of wall time (again, handling 2 MB of data). I wouldn't be surprised if the first call simply takes so long because it is lazily loading a bunch of code.
Later on memory consumption is measured, and one of the file format readers manages to consume -1 MB.
This article has a very bad smell.