Why aren’t they using this technique to design better transformer architectures ...

jebarker · on Sept 27, 2024

Because chip placement and the design of neural network architectures are entirely different problems, so this solution won't magically transfer from one to the other.

abc-1 · on Sept 27, 2024

And AlphaGo is trained to play Go? The point is training a model through self play to build neural network architectures. If it can play Go and architect chip placements, I don’t see why it couldn’t be trained to build novel ML architectures.

jebarker · on Sept 28, 2024

Sure, they could choose to work on that problem. But why do you think that's a more important/worthwhile problem than chip design or any other problem they might choose to work on? My point was that it's not trivial to make self-play for some other problem work, so given all the problems in the world why did you single our neural network architecture design? Especially since it's not the transformer architecture that is really holding back AI progress.

abc-1 · on Sept 29, 2024

Recursive self improvement