It says "current" LLMs can't "genuinely" reason. Also, one of the researchers then posted an internship for someone to work on LLM reasoning.
I think the paper should've included controls, because we don't know how strong the result is. They certainly may have proven that humans can't reason either.
If they had human controls, they might well show that some humans can’t do any better, but based on how they generated test cases, it seems unlikely to me that doing so would prove that humans cannot reason (of course, if that’s actually the case, we cannot trust ourselves to devise, execute and interpret these tests in the first place!)
Some people will use any limitation of LLMs to deny there is anything to see here, while others will call this ‘moving the goalposts’, but the most interesting questions, I believe, involve figuring out what the differences are, putting aside the question of whether LLMs are or are not AGIs.
I think the paper should've included controls, because we don't know how strong the result is. They certainly may have proven that humans can't reason either.