Yes, I mean having limits on everything! Every input has a max value. Players, messages count, message length, weapons they can hold etc. You can add more by recompiling, you should reuse them (for size but also to keep data contiguous) even if that is challenging under multicore utilization and you can add MUUUUUCH more than the CPU can handle to compute anyway so this is not an issue.
The first and last bottleneck of computers is and will always be RAM. Both speed, memory size and energy. A 256GB RAM stick uses 80W!!!! Latency is increasing since DDR3 (2007) and we have had caches to accommodate for slow RAM since 386 (1985) (3 always seems to be the last version, HL3 confirmed? >.<):
You need to cache align everything perfectly: 1) all data has to be in an array (or vector which is a managed array but i digress) 2) you need your types to be atomic so multiple cores can write to them at the same time without segmentation fault (int/float). 3) You need your groups/objects/structs to perfectly fill (padded) 64 bytes. Because then multiple cores cannot invalidate each others cache unless they are writing to the same struct.
So SoA vs. AoS never was an argument! AoS where structures are exactly 64 bytes is the only thing all programmers must do for eternity! This is the law of both X86 and ARM.
So an array of float Mat4x4 is perfect and I suspect that is where the 64 bytes came from. But here is another struct just as an example:
But things don’t fit into 64 bits all the time, and then you get tearing. This is observable and now you have to pay for “proper” synchronization. Also, apple’s m1’s reason for speed is pretty much bigger cache, so I don’t think it’s a good choice to go down this road.
Most applications have plenty of objects all around that are rarely used and are perfectly fine with being managed by the GC as is, and a tiny performance critical core where you might have to care a tiny bit about what gets allocated. This segment can be optimized other ways as well, without hurting the maintainability, speed of progress etc of the rest of the codebase.
Can you ask the OS to give you a certain core type?
128 bytes is perfect 2 x 64! So even if the risk of cache invalidation goes up even if two cores are not writing to the exact same structure the alignment still works!
The first and last bottleneck of computers is and will always be RAM. Both speed, memory size and energy. A 256GB RAM stick uses 80W!!!! Latency is increasing since DDR3 (2007) and we have had caches to accommodate for slow RAM since 386 (1985) (3 always seems to be the last version, HL3 confirmed? >.<):
You need to cache align everything perfectly: 1) all data has to be in an array (or vector which is a managed array but i digress) 2) you need your types to be atomic so multiple cores can write to them at the same time without segmentation fault (int/float). 3) You need your groups/objects/structs to perfectly fill (padded) 64 bytes. Because then multiple cores cannot invalidate each others cache unless they are writing to the same struct.
So SoA vs. AoS never was an argument! AoS where structures are exactly 64 bytes is the only thing all programmers must do for eternity! This is the law of both X86 and ARM.
So an array of float Mat4x4 is perfect and I suspect that is where the 64 bytes came from. But here is another struct just as an example: