Perhaps the smaller model used in o1 is over trained on arxiv and code relative ...

		m101 on Sept 13, 2024 \| parent \| context \| favorite \| on: Notes on OpenAI's new o1 chain-of-thought models Perhaps the smaller model used in o1 is over trained on arxiv and code relative to 4o (or undertrained on legal text)