Much of our reasoning develops from metaphorical comparisons with our embodied experiences. E.g. it is certainly true that we gain spatial awareness from embodiment, and it is worth noting how much of our language of reasoning uses spatial metaphors. Can we expect a non-embodied AI whose only view of the world is text we feed it to develop a similar understanding?
That’s the steelmanned argument at least.