Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah, I did a lot of traditional optimization problems during my Ph. D., this type of expression pops up all the time with higher-order gradient-based methods. You rescale or otherwise adjust the gradient based on some system-characteristic eigenvalues to promote convergence without overshooting too much.


This sounds a lot like what the Muon / Shampoo optimizer do.


Would you have some literature about that ?


There's a ton but it's pretty scattered. Yurii Nesterov's a big name, for example.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: