For AlphaChip, pre-training is just training. You train, and save the weights in...

clickwiseorange · on Sept 27, 2024

It's not that one needs an excuse. The Google CT repo said clearly you don't need to pretrain. "supported" usually includes at least an illustration, some scripts to get it going - no such thing there before Kahng's paper. Pre-trained was not recommended and was not supported.

Everything optimized in Nature RL is an approximation. HPWL is where you start, and RL uses it in the objective function too. As shown in "Stronger Baselines", RL loses a lot by HPWL - so much that nothing else can save it. If your wires are very long, you need routing tracks to route them, and you end up with congestion too.

SA consistently produces better solutions than RL for various time budgets. That's what matters. Both papers have shown that SA produces competent solutions. You give SA more time, you get better solutions. In a fair comparison, you give equal budgets to SA and RL. RL loses. This was confirmed using Google's RL code and two independent SA implementations, on many circuits. Very definitively. No, ML did not have SA beat - please read the papers.

Cadence hasn't funded Kahng for a long time. In fact, Google funded Kahng more recently, so he has all the incentives to support Google. Markov's LinkedIn page says he worked at Google before. Even Chatterjee, of all people, worked at Google.

Google's open-source tool is a head fake, it's practically unusable.

Update: I'll respond to the next comment here since there's no Reply button.

1. The Nature paper said one thing, the code did something else, as we've discovered. The RL method does some training as it goes. So, pre-training is not the same as training. Hence "pre". Another problem with pretraining in Google work is data contamination - we can't compare test and training data. The Google folks admitted to training and testing on different versions of the same design. That's bad. Rejection-level bad.

2. HPWL is indeed a nice simple objective. So nice that Jeff Dean's recent talks use it. It is chip design. All commercial circuit placers without exception optimize it and report it. All EDA publications report it. Google's RL optimized HPWL + density + congestion

3. This shows you aren't familiar with EDA. Simulated Annealing was the king of placement from mid 1980s to mid 1990s. Most chips were placed by SA. But you don't have to go far - as I recall, the Nature paper says they used SA to postprocess macro placements.

SA can indeed find mediocre solutions quickly, but keeps on improving them, just like RL. Perhaps, you aren't familiar with SA. I am. There are provable results showing SA finds optimal solution if given enough time. Not for RL.

negativeonehalf · on Sept 27, 2024

The Nature paper describes the importance of pre-training repeatedly. The ability to learn from experience is the whole point of the method. Pre-training is just training and saving the weights -- this is ML 101.

I'm glad you agree that HPWL is a proxy metric. Optimizing HPWL is a fun applied math puzzle, but it's not chip design.

I am unaware of a single instance of someone using SA to generate real-world, usable macro layouts that were actually taped out, much less for modern chip design, in part due to SA's struggles to manage congestion, resulting in unusable layouts. SA converges quickly to a bad solution, but this is of little practical value.

clickwiseorange · on Sept 28, 2024

1. The Nature paper said one thing, the code did something else, as we've discovered. The RL method does some training as it goes. So, pre-training is not the same as training. Hence "pre". Another problem with pretraining in Google work is data contamination - we can't compare test and training data. The Google folks admitted to training and testing on different versions of the same design. That's bad. Rejection-level bad.

2. HPWL is indeed a nice simple objective. So nice that Jeff Dean's recent talks use it. It is chip design. All commercial circuit placers without exception optimize it and report it. All EDA publications report it. Google's RL optimized HPWL + density + congestion

3. This shows you aren't familiar with EDA. Simulated Annealing was the king of placement from mid 1980s to mid 1990s. Most chips were placed by SA. But you don't have to go far - as I recall, the Nature paper says they used SA to postprocess macro placements.

SA can indeed find mediocre solutions quickly, but keeps on improving them, just like RL. Perhaps, you aren't familiar with SA. I am. There are provable results showing SA finds optimal solution if given enough time. Not for RL.

AshamedCaptain · on Sept 28, 2024

SA and HPWL are most definitely used as of today for the chips that power the GPUs used for "ML 101". But frankly this has the same value as saying "some sort algorithm is used somewhere" -- they're well entrenched basics of the field. To claim that SA produces "bad congestion" is like claiming that using steel pans produces bad cooking -- needs a shitton of context and qualification since you cannot generalize this way.

foobarqux · on Sept 27, 2024

I think clicking the time stamp field in the comment will allow you to get a reply box.