reproducing the alphazero-like "model learns to reason on its own without superv...

		evertedsphere on Jan 25, 2025 \| parent \| context \| favorite \| on: TinyZero: Reproduction of DeepSeek R1 Zero in coun... reproducing the alphazero-like "model learns to reason on its own without supervised fine-tuning" phenomenon that deepseek-r1-zero exhibited