Yeah, I did a lot of traditional optimization problems during my Ph. D., this ty... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		lcnielsen 38 days ago \| parent \| context \| favorite \| on: How does gradient descent work? Yeah, I did a lot of traditional optimization problems during my Ph. D., this type of expression pops up all the time with higher-order gradient-based methods. You rescale or otherwise adjust the gradient based on some system-characteristic eigenvalues to promote convergence without overshooting too much.

d3m0t3p 38 days ago | [–]

This sounds a lot like what the Muon / Shampoo optimizer do.

d3m0t3p 38 days ago | [–]

Would you have some literature about that ?

lcnielsen 37 days ago | [–]

There's a ton but it's pretty scattered. Yurii Nesterov's a big name, for example.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact