Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
[dead]
6 months ago | hide | past | favorite


Hello! Here is a breakdown of GRPO (Group Relative Policy Optimization), used to train reasoning models like DeepSeek.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: