Yeah, but an even more important system programming trick is to measure what you're doing every time. Performance optimization at this level is never about just trusting tools. If you aren't willing to be reading generated machine code and checking perf counters, you aren't really going to get much benefit.
And if you're willing to check your optimizations by reading disassembly, tricks like stuffing tag bits into the bottom of aligned pointer values is pretty routine.
And if you're willing to check your optimizations by reading disassembly, tricks like stuffing tag bits into the bottom of aligned pointer values is pretty routine.