> At least with respect to this problem, they had no theory of mind.
This is very interesting and insightful, but I take issue with the above conclusion. Your average software engineer would probably fail to code up a python solution to this problem. But most people would agree that the average software engineer, and the average person, possesses some theory of mind.
This seems to be a pattern I'm noticing with AI. The goalposts keep moving. When I was a kid, the turing test was the holy grail for "artificial intelligence." Now, your run-of-the-mill LLM can breeze through the turing test. But no one seems to care. "They are just imitating us, that doesn't count." Every couple years, AI/ML systems make revolutionary advances, but everyone pretends it's not a big deal because of some new excuse. The latest one being "LLMs can't write a python program to solve an entire class of very challenging logic problems. Therefore LLMs possess no theory of mind."
Let me stick my neck out and say something controversial. Are the latest LLMs as smart as Peter Norvig? No. Are they smarter than your average human? Yes. Can they outperform your average human at a randomly chosen cognitive task that has real-world applications? Yes. This is pretty darn revolutionary. We have crossed the rubicon. We are watching history unfold in real-time.
We once thought that a computer could not beat a grandmaster in chess or pass the Turing test without some undefined special human property. We were wrong about the computer needing this undefined special human property.
A spreadsheet has been much better at math than the average person for a long time too. A spreadsheet is a very useful human tool. LLMs are a revolutionary useful tool. For some people that doesn't seem to be enough though and they have to try to find or insist the LLM has the undefined special human property.
Does that count as a program that solves the problem? Your program finds the unique days/months, but you're hardcoding the part where the program discerns who knows what.
Maybe that counts, I don't know, I'm genuinely asking.
He only specified that it should be flexible with respect to the specific dates, so I think so. If people knew different things it would be a different problem.
Norvig’s solution is very elegant, and basically establishes an API for declaring who knows what. I learn a lot about readability every time I read one of his programs.
Let me say this. I am convinced i cannot write a program that solves the puzzle in 10 minutes.
I am convinced though that i can write such program, including some test cases, with the help of an llm like bing copilot in 10 minutes. The global reasoning/steps would be mine, the llm would fill in the details.
I'm also convinced that it will be a matter of time (less than 5 years) before these kind of problems are solved trivially by llms, without prior example in the training set being necessary.
In other words, 'theory of mind' (of type defined by the author of the article) has already emerged from machines.
People are a bit reluctant to believe that, me not so much.
> Now, your run-of-the-mill LLM can breeze through the turing test.
Can they? You can ask arbitrary questions in the Turing test. I doubt many models would be able successfully imitate humans in such adversarial conditions. Note that the Turing test doesn't require to judge to be unsophisticated or unknowledgeable about AI's capabilities or weaknesses. I believe that AI's are closer than ever passing the Turing test, but I'm sceptical until I see it.
For me the simplest way to test would be to first ask specific knowledge, and then ask where it learnt that knowledge, and check the reference. Currently they fail spectacularly, and the most useful next step would be to use source-aware training
I am not trying to explain your specific brain. Next time people play the game werewolf in real life, join it for a couple of rounds, and tell the players you're not too familiar with the game, and ask them to discuss mistakes after each round. You will notice they pay a lot of attention to who said what. If you don't pay attention you become like a villager lynching random people, while if the villagers play enough attention they can prevent the wherewolf from killing the whole village most of the time.
> Your average software engineer would probably fail to code up a python solution to this problem
[citation needed]. I say that, if you can't write a program that solves this problem, you don't have any business calling yourself a "software engineer".
This is very interesting and insightful, but I take issue with the above conclusion. Your average software engineer would probably fail to code up a python solution to this problem. But most people would agree that the average software engineer, and the average person, possesses some theory of mind.
This seems to be a pattern I'm noticing with AI. The goalposts keep moving. When I was a kid, the turing test was the holy grail for "artificial intelligence." Now, your run-of-the-mill LLM can breeze through the turing test. But no one seems to care. "They are just imitating us, that doesn't count." Every couple years, AI/ML systems make revolutionary advances, but everyone pretends it's not a big deal because of some new excuse. The latest one being "LLMs can't write a python program to solve an entire class of very challenging logic problems. Therefore LLMs possess no theory of mind."
Let me stick my neck out and say something controversial. Are the latest LLMs as smart as Peter Norvig? No. Are they smarter than your average human? Yes. Can they outperform your average human at a randomly chosen cognitive task that has real-world applications? Yes. This is pretty darn revolutionary. We have crossed the rubicon. We are watching history unfold in real-time.