> *At least with respect to this problem, they had no theory of mind.* This is v...

hyperG · on Oct 6, 2024

It is because the goalposts were wrong.

We once thought that a computer could not beat a grandmaster in chess or pass the Turing test without some undefined special human property. We were wrong about the computer needing this undefined special human property.

A spreadsheet has been much better at math than the average person for a long time too. A spreadsheet is a very useful human tool. LLMs are a revolutionary useful tool. For some people that doesn't seem to be enough though and they have to try to find or insist the LLM has the undefined special human property.

titanomachy · on Oct 5, 2024

I consider myself a pretty average human programmer, and I was able to solve the logic puzzle and write a python program for it in ~10 mins. [0]

I agree though, the people who are unable to solve this probably still have a theory of mind. It seems like we're setting a rather high bar.

[0] https://pastebin.com/q33K0HJ1

kenjackson · on Oct 5, 2024

With all due respect, if you wrote a python program for this in 10 minutes you are not an average programmer.

titanomachy · on Oct 5, 2024

Fair enough. Most of my peers could do it, but I guess they're not particularly average either.

stavros · on Oct 6, 2024

Does that count as a program that solves the problem? Your program finds the unique days/months, but you're hardcoding the part where the program discerns who knows what.

Maybe that counts, I don't know, I'm genuinely asking.

titanomachy · on Oct 6, 2024

He only specified that it should be flexible with respect to the specific dates, so I think so. If people knew different things it would be a different problem.

Norvig’s solution is very elegant, and basically establishes an API for declaring who knows what. I learn a lot about readability every time I read one of his programs.

nuancebydefault · on Oct 5, 2024

Let me say this. I am convinced i cannot write a program that solves the puzzle in 10 minutes.

I am convinced though that i can write such program, including some test cases, with the help of an llm like bing copilot in 10 minutes. The global reasoning/steps would be mine, the llm would fill in the details.

I'm also convinced that it will be a matter of time (less than 5 years) before these kind of problems are solved trivially by llms, without prior example in the training set being necessary.

In other words, 'theory of mind' (of type defined by the author of the article) has already emerged from machines.

People are a bit reluctant to believe that, me not so much.

GolDDranks · on Oct 6, 2024

> Now, your run-of-the-mill LLM can breeze through the turing test.

Can they? You can ask arbitrary questions in the Turing test. I doubt many models would be able successfully imitate humans in such adversarial conditions. Note that the Turing test doesn't require to judge to be unsophisticated or unknowledgeable about AI's capabilities or weaknesses. I believe that AI's are closer than ever passing the Turing test, but I'm sceptical until I see it.

stavros · on Oct 6, 2024

What kind of questions would you ask to distinguish?

DoctorOetker · on Oct 6, 2024

For me the simplest way to test would be to first ask specific knowledge, and then ask where it learnt that knowledge, and check the reference. Currently they fail spectacularly, and the most useful next step would be to use source-aware training

stavros · on Oct 6, 2024

Why would I know where I learned a thing, much less be expected to produce a valid URL off the top of my head?

DoctorOetker · on Oct 14, 2024

I am not trying to explain your specific brain. Next time people play the game werewolf in real life, join it for a couple of rounds, and tell the players you're not too familiar with the game, and ask them to discuss mistakes after each round. You will notice they pay a lot of attention to who said what. If you don't pay attention you become like a villager lynching random people, while if the villagers play enough attention they can prevent the wherewolf from killing the whole village most of the time.

stavros · on Oct 14, 2024

Where did you learn that the capital of Virginia is Richmond?

Jerrrrrrry · on Oct 5, 2024

The goalposts will continue to move until GDP improves.

DoctorOetker · on Oct 6, 2024

until who's GDP moves?

Suppose nation X or power bloc Y's GDP improves due to ML, will nation Z without increasing GDP continue to move the goalposts?

burner_in_ma · on Oct 6, 2024

> Your average software engineer would probably fail to code up a python solution to this problem

[citation needed]. I say that, if you can't write a program that solves this problem, you don't have any business calling yourself a "software engineer".