> reinforcement learning, which is what most people mean when they talk about re...

sothatsit · 2025-11-01T00:58:43 1761958723

1. This is obviously not about popularity... It is about capability. You cannot use crappy 2-year-old models with chain-of-thought to make inferences about frontier reasoning models that were released less than a year ago.

2. It is literally impossible for the models to have memorised all the results from multiplying 8-digit numbers. There are at least 10^14 8-digit multiplications that are possible (lower-bound), which from an information theory perspective would require a minimum of 48 PiB of data to hold. They have to be applying algorithms internally to perform this task, even if that algorithm is just uncompressing some unbelievably-well-compressed form of the results.

3. If you expect 100% reliability, obviously all humans would also fail. Therefore, do humans not reason? The answer is obviously not. We are trying to demonstrate that LLMs can exhibit reasoning here, not whether or not their reasoning has flaws or limitations (which it obviously does).

riku_iki · 2025-11-01T01:08:15 1761959295

> You cannot use crappy 2-year-old models with chain-of-thought to make inferences about frontier reasoning models that were released less than a year ago.

the idea is to train new specialized model, which could specifically demonstrate if LLM can learn multiplication.

> It is literally impossible for the models to have memorised how to multiply 8-digit numbers. There are at least 10^14 8-digit multiplications that are possible

sure, they could memorize fragments: if that fragment contains that seq of digits, then that fragment must contains that seq of digits, which is much smaller space

> If you expect 100% reliability, obviously all humans would also fail. Therefore, do humans not reason?

Human fail because they are weak in this case because they can't reliably do arithmetic, and sometimes make mistake, also I speculate if you give enough time, and ask human to triple check calculations, result will be very good.

sothatsit · 2025-11-01T02:18:19 1761963499

We also cap how long we let reasoning LLMs think for. OpenAI researchers have already discussed models they let reason for hours that could solve much harder problems.

But regardless, I feel like this conversation is useless. You are clearly motivated to not think LLMs are reasoning by 1) only looking at crappy old models as some sort of evidence about new models, which is nonsense, and 2) coming up with nonsensical arguments about how they could still be memorising answers that make no sense. Even if they memorised sequences, they still have to put that together to get the exact right answers to 8-digit multiplication in >90% of cases. That requires the application of algorithms, aka reasoning.

riku_iki · 2025-11-01T02:23:56 1761963836

> only looking at crappy old model

let me repeat this: it was newly trained specialized model

other rants are ignored.

sothatsit · 2025-11-01T02:24:49 1761963889

They did not use modern techniques. Therefore it is meaningless.

That’s not to mention that modern frontier LLMs can also be demonstrated to do this task, which is an existence proof in and of itself.

riku_iki · 2025-11-01T02:26:57 1761964017

I am not interested in this discussion anymore. Bye.

sothatsit · 2025-11-01T02:33:04 1761964384

What a shame