>All of our tests and benchmarks account for repeatability.
What does repeatability have to do with intelligence? If I ask a 6 year old "Is 1+1=2" I don't change my estimation of their intelligence the 400th time they answer correctly.
>The machine in question has no problem replicating its results on whatever test
What machine is that? All the LLMs I have tried produce neat results on very narrow topics but fail on consistency and generality. Which seems like something you would want in a general intelligence.
>What does repeatability have to do with intelligence? If I ask a 6 year old "Is 1+1=2" I don't change my estimation of their intelligence the 400th time they answer correctly.
If your 6 year old can only answer correctly a few times out of that 400 and you don't change your estimation of their understanding of arithmetic then, I sure hope you are not a teacher.
>What machine is that? All the LLMs I have tried produce neat results on very narrow topics but fail on consistency and generality. Which seems like something you would want in a general intelligence.
No LLM will score 80% on benchmark x today then 50% on the same 2 days later. That doesn't happen, so the convoluted setup OP had is meaningless. LLMs do not 'fail' on consistency or generality.
What does repeatability have to do with intelligence? If I ask a 6 year old "Is 1+1=2" I don't change my estimation of their intelligence the 400th time they answer correctly.
>The machine in question has no problem replicating its results on whatever test
What machine is that? All the LLMs I have tried produce neat results on very narrow topics but fail on consistency and generality. Which seems like something you would want in a general intelligence.