I loved working on the cell. I remember writing my first spu code when the ps3 dev kit was a huge stainless steel box. I later went in to implement predictive physics for racing line optimization for midnight club on the spus. Was a lot of fun. Weird thing was the floating point unit had a different intermediate bit size than the ppu so for deterministic results you had to schedule threads to always be dispatched to ppu or spu deterministically. That was essential for network games where the sim had to run bit for bit the same across players.
I'm a systems nerd, and I found working with it quite challenging, but rewarding. It's been many years, but I still remember a number of the challenges. SPE's didn't have shared memory access to RAM, so data transfer was your problem to solve as a developer, and each SPE had 256k of RAM. These things were very fast for the day, so they'd crunch through the data very quickly. We double-buffered the RAM, using about 100k for data, while simultaneously using the other 100k as a read buffer for the DMA engine.
That was the trickiest part - getting the data in and out of the thing. You had 6 SPE's available to you, 2 were reserved by the OS, and keeping them all filled was a challenge because it required nearly optimal usage of the DMA engine. Memory access was slow, something over 1000 cycles from issuing the DMA until data started coming in.
Back then, C++ was all the rage and people did their various C++ patterns, but due to the instruction size being so limited, we just hand-wrote some code to run on the SPU's which didn't match the rest of the engine, so it ended up gluing together two dissimilar codebases.
I both miss the cleverness required back then, but also don't miss the complexity. Things are so much simpler now that game consoles are basically PC's with PC-style dev tools. Also, as much as I complain about the PS3, at least it wasn't the PS2.
Yep, all valid. When I started on it we had to do everything ourselves. But by the time I did serious dev on it our engine team had already build vector/matrix libraries that worked on both ppu and spu and had a dispatcher that took care of all the double buffering for me.
Indeed, anyone who mastered the parallelism of the PS3 bettered themselves and found the knowledge gained applied to the future of all multi core architectures. Our PC builds greatly benefitted from the architecture changes forced on us by the PS3