Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		[dead]
		6 months ago \| hide \| past \| favorite

luis_likes_math 6 months ago [–]

Hello! Here is a breakdown of GRPO (Group Relative Policy Optimization), used to train reasoning models like DeepSeek.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact