Yes - I remember using these techniques addressing 2d dense matrices. The specific case was when the matrices involved grew so large that the linear access operations were causing page faults for sequential row or column access. This was a perfect example of pathologically bad memory accesses when data was stored in row-major or column-major layouts.
Very impressive speedups for these interleaved data layouts. The end goal was to embed these layouts into the compiler tools, so that the programmer would do normal two dimensional matrix allocations and would get the optimized layout based on analysis.
I wrote terrible C to accomplish these layouts and minimize the instruction counts.
Very impressive speedups for these interleaved data layouts. The end goal was to embed these layouts into the compiler tools, so that the programmer would do normal two dimensional matrix allocations and would get the optimized layout based on analysis.
I wrote terrible C to accomplish these layouts and minimize the instruction counts.