Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, I mean having limits on everything! Every input has a max value. Players, messages count, message length, weapons they can hold etc. You can add more by recompiling, you should reuse them (for size but also to keep data contiguous) even if that is challenging under multicore utilization and you can add MUUUUUCH more than the CPU can handle to compute anyway so this is not an issue.

The first and last bottleneck of computers is and will always be RAM. Both speed, memory size and energy. A 256GB RAM stick uses 80W!!!! Latency is increasing since DDR3 (2007) and we have had caches to accommodate for slow RAM since 386 (1985) (3 always seems to be the last version, HL3 confirmed? >.<):

You need to cache align everything perfectly: 1) all data has to be in an array (or vector which is a managed array but i digress) 2) you need your types to be atomic so multiple cores can write to them at the same time without segmentation fault (int/float). 3) You need your groups/objects/structs to perfectly fill (padded) 64 bytes. Because then multiple cores cannot invalidate each others cache unless they are writing to the same struct.

So SoA vs. AoS never was an argument! AoS where structures are exactly 64 bytes is the only thing all programmers must do for eternity! This is the law of both X86 and ARM.

So an array of float Mat4x4 is perfect and I suspect that is where the 64 bytes came from. But here is another struct just as an example:

  struct Node {
    int mesh, skin;
    Vec3 spot, pace;
    Quat look, spin;
  };


But things don’t fit into 64 bits all the time, and then you get tearing. This is observable and now you have to pay for “proper” synchronization. Also, apple’s m1’s reason for speed is pretty much bigger cache, so I don’t think it’s a good choice to go down this road.

Most applications have plenty of objects all around that are rarely used and are perfectly fine with being managed by the GC as is, and a tiny performance critical core where you might have to care a tiny bit about what gets allocated. This segment can be optimized other ways as well, without hurting the maintainability, speed of progress etc of the rest of the codebase.


64 bytes, 512 bits.

Cachelines are 64 bytes on all modern hardware.

They will probably never change this value ever.

Everything fits into 64 bytes if you make the effort.

And if it doesn't you have to use two Arrays of 64 byte Structures and pad the last.

This is non negotiable and I'm completely baffled nobody has mentioned this yet.

I call this law: Ao64bS (did I invent my first law?) :D


It may be a language barrier thingy, but then we are talking about different things. Also, that is architecture dependent.


Nope on all modern CPUs (X86 and ARM) this is 64 bytes and has been since a looong time...


Cache lines are 128 bytes on M1.

But since it’s AMP and not SMP, sharing work across cores doesn’t necessarily work how you expect it to.


Can you ask the OS to give you a certain core type?

128 bytes is perfect 2 x 64! So even if the risk of cache invalidation goes up even if two cores are not writing to the exact same structure the alignment still works!

Good job Apple!


There absolutely are modern systems that have e.g. 128 byte cache lines (M1).


> A 256GB RAM stick uses 80W!!!!

This seems very high to me, since most high capacity server sticks have no heatsink. Have you got a source?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: