Tbf RL is pretty incredible. I trained a model to play a novel video game using ...

		dartos on Sept 13, 2024 \| parent \| context \| favorite \| on: Notes on OpenAI's new o1 chain-of-thought models Tbf RL is pretty incredible. I trained a model to play a novel video game using only screenshots and a score using RL and I discovered how not to lose