Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> At least with respect to this problem, they had no theory of mind.

This is very interesting and insightful, but I take issue with the above conclusion. Your average software engineer would probably fail to code up a python solution to this problem. But most people would agree that the average software engineer, and the average person, possesses some theory of mind.

This seems to be a pattern I'm noticing with AI. The goalposts keep moving. When I was a kid, the turing test was the holy grail for "artificial intelligence." Now, your run-of-the-mill LLM can breeze through the turing test. But no one seems to care. "They are just imitating us, that doesn't count." Every couple years, AI/ML systems make revolutionary advances, but everyone pretends it's not a big deal because of some new excuse. The latest one being "LLMs can't write a python program to solve an entire class of very challenging logic problems. Therefore LLMs possess no theory of mind."

Let me stick my neck out and say something controversial. Are the latest LLMs as smart as Peter Norvig? No. Are they smarter than your average human? Yes. Can they outperform your average human at a randomly chosen cognitive task that has real-world applications? Yes. This is pretty darn revolutionary. We have crossed the rubicon. We are watching history unfold in real-time.



It is because the goalposts were wrong.

We once thought that a computer could not beat a grandmaster in chess or pass the Turing test without some undefined special human property. We were wrong about the computer needing this undefined special human property.

A spreadsheet has been much better at math than the average person for a long time too. A spreadsheet is a very useful human tool. LLMs are a revolutionary useful tool. For some people that doesn't seem to be enough though and they have to try to find or insist the LLM has the undefined special human property.


I consider myself a pretty average human programmer, and I was able to solve the logic puzzle and write a python program for it in ~10 mins. [0]

I agree though, the people who are unable to solve this probably still have a theory of mind. It seems like we're setting a rather high bar.

[0] https://pastebin.com/q33K0HJ1


With all due respect, if you wrote a python program for this in 10 minutes you are not an average programmer.


Fair enough. Most of my peers could do it, but I guess they're not particularly average either.


Does that count as a program that solves the problem? Your program finds the unique days/months, but you're hardcoding the part where the program discerns who knows what.

Maybe that counts, I don't know, I'm genuinely asking.


He only specified that it should be flexible with respect to the specific dates, so I think so. If people knew different things it would be a different problem.

Norvig’s solution is very elegant, and basically establishes an API for declaring who knows what. I learn a lot about readability every time I read one of his programs.


Let me say this. I am convinced i cannot write a program that solves the puzzle in 10 minutes.

I am convinced though that i can write such program, including some test cases, with the help of an llm like bing copilot in 10 minutes. The global reasoning/steps would be mine, the llm would fill in the details.

I'm also convinced that it will be a matter of time (less than 5 years) before these kind of problems are solved trivially by llms, without prior example in the training set being necessary.

In other words, 'theory of mind' (of type defined by the author of the article) has already emerged from machines.

People are a bit reluctant to believe that, me not so much.


> Now, your run-of-the-mill LLM can breeze through the turing test.

Can they? You can ask arbitrary questions in the Turing test. I doubt many models would be able successfully imitate humans in such adversarial conditions. Note that the Turing test doesn't require to judge to be unsophisticated or unknowledgeable about AI's capabilities or weaknesses. I believe that AI's are closer than ever passing the Turing test, but I'm sceptical until I see it.


What kind of questions would you ask to distinguish?


For me the simplest way to test would be to first ask specific knowledge, and then ask where it learnt that knowledge, and check the reference. Currently they fail spectacularly, and the most useful next step would be to use source-aware training


Why would I know where I learned a thing, much less be expected to produce a valid URL off the top of my head?


I am not trying to explain your specific brain. Next time people play the game werewolf in real life, join it for a couple of rounds, and tell the players you're not too familiar with the game, and ask them to discuss mistakes after each round. You will notice they pay a lot of attention to who said what. If you don't pay attention you become like a villager lynching random people, while if the villagers play enough attention they can prevent the wherewolf from killing the whole village most of the time.


Where did you learn that the capital of Virginia is Richmond?


The goalposts will continue to move until GDP improves.


until who's GDP moves?

Suppose nation X or power bloc Y's GDP improves due to ML, will nation Z without increasing GDP continue to move the goalposts?


> Your average software engineer would probably fail to code up a python solution to this problem

[citation needed]. I say that, if you can't write a program that solves this problem, you don't have any business calling yourself a "software engineer".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: